<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: DigitalOcean</title>
    <description>The latest articles on Forem by DigitalOcean (@digitalocean).</description>
    <link>https://forem.com/digitalocean</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F175%2F369f1227-0eac-4a88-8d3c-08851bf0b117.png</url>
      <title>Forem: DigitalOcean</title>
      <link>https://forem.com/digitalocean</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/digitalocean"/>
    <language>en</language>
    <item>
      <title>Build an End-to-End RAG Pipeline for LLM Applications</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Wed, 01 Apr 2026 01:06:34 +0000</pubDate>
      <link>https://forem.com/digitalocean/build-an-end-to-end-rag-pipeline-for-llm-applications-1330</link>
      <guid>https://forem.com/digitalocean/build-an-end-to-end-rag-pipeline-for-llm-applications-1330</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Shaoni Mukherjee (Technical Writer)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/resources/articles/large-language-models" rel="noopener noreferrer"&gt;Large language models&lt;/a&gt; have transformed the way we build intelligent applications. &lt;a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer"&gt;Generative AI Models&lt;/a&gt; can summarize documents, generate code, and answer complex questions. However, they still face a major limitation: they cannot access private or continuously changing knowledge unless that information is incorporated into their training data.&lt;/p&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) addresses this limitation by combining information retrieval systems with generative AI models. Instead of relying entirely on the knowledge embedded in model weights, a RAG system retrieves relevant information from external sources and provides it to the language model during inference. The model then generates a response grounded in this retrieved context.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;end-to-end RAG pipeline&lt;/strong&gt; refers to the full system that manages this process from beginning to end. It includes ingesting documents, transforming them into embeddings, storing them in a vector database, retrieving relevant information for a user query, and generating an answer using a large language model.&lt;/p&gt;

&lt;p&gt;This architecture is increasingly used in modern AI systems such as enterprise knowledge assistants, internal documentation search engines, developer copilots, and AI customer support tools. Organizations adopt RAG because it allows models to remain lightweight while still accessing large knowledge bases that change frequently.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will walk through how to design and build a complete RAG pipeline. Along the way, we will explore architectural considerations, optimization strategies, and production challenges developers encounter when deploying retrieval-based AI systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmeku3hdzligtrv0nf06.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmeku3hdzligtrv0nf06.png" alt="Knowledge and Vector Storage for RAG pipeline" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAG combines retrieval and generation for more accurate AI systems&lt;/strong&gt;: Retrieval-Augmented Generation (RAG) bridges the gap between static language models and dynamic, real-world data. Instead of relying only on pre-trained knowledge, it fetches relevant information at runtime and uses it to generate answers. This makes responses more accurate, up-to-date, and context-aware. It is especially useful for applications like chatbots, internal knowledge assistants, and search systems. Overall, RAG helps reduce hallucinations and improves trust in AI-generated outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector embeddings are the foundation of semantic search in RAG&lt;/strong&gt;: Embeddings convert text into numerical vectors that capture meaning rather than exact wording. This allows the system to understand similarity between queries and documents even if they use different phrasing. As a result, retrieval becomes more intelligent and context-driven instead of keyword-based. High-quality embedding models like &lt;code&gt;text-embedding-3-large&lt;/code&gt; or &lt;code&gt;bge-large-en&lt;/code&gt; can significantly improve retrieval performance. Choosing the right embedding model directly impacts the overall quality of your RAG system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each component of the pipeline plays a critical role&lt;/strong&gt;: A RAG system is made up of multiple steps, including ingestion, chunking, embedding, storage, retrieval, and generation. If any one component is poorly optimized, it can affect the entire pipeline’s performance. For example, bad chunking can lead to irrelevant retrieval, even if your embedding model is strong. Similarly, weak retrieval will result in poor answers, no matter how powerful the language model is. This is why building an end-to-end RAG system requires careful design and tuning at every stage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation is essential for building reliable RAG applications&lt;/strong&gt;: It is not enough to just build a RAG pipeline, but you must also evaluate how well it performs. This includes checking whether the system retrieves the correct documents and whether the generated answers are accurate and grounded. Metrics like precision and recall help measure retrieval quality, while human evaluation helps assess answer correctness. Creating benchmark datasets with known questions and answers makes it easier to track improvements over time. Continuous evaluation ensures your system remains reliable in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Understanding the RAG System Architecture
&lt;/h2&gt;

&lt;p&gt;Before implementing the pipeline, it is important to understand how the different components interact. A typical &lt;strong&gt;RAG system architecture&lt;/strong&gt; can be divided into two major workflows: the indexing pipeline and the retrieval pipeline.&lt;/p&gt;

&lt;p&gt;The indexing pipeline prepares the knowledge base so that it can be searched efficiently. During this stage, documents are ingested, cleaned, split into chunks, converted into embeddings, and stored in a &lt;a href="https://www.digitalocean.com/community/tutorials/beyond-vector-databases-rag-without-embeddings" rel="noopener noreferrer"&gt;vector database&lt;/a&gt;. This process is usually executed offline or periodically when new data becomes available.&lt;/p&gt;

&lt;p&gt;The retrieval pipeline operates during inference. When a user asks a question, the system converts that query into an &lt;a href="https://www.digitalocean.com/community/tutorials/beyond-vector-databases-rag-without-embeddings" rel="noopener noreferrer"&gt;embedding&lt;/a&gt;, searches the vector database for semantically similar chunks, and provides those retrieved passages to the language model. The model then generates a response using both the query and the contextual information.&lt;/p&gt;

&lt;p&gt;A simplified representation of the pipeline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document Sources
       (PDFs, Docs, APIs, Knowledge Base)
                        |
                        v
               Document Processing
                        |
                        v
                  Text Chunking
                        |
                        v
               Embedding Generation
                        |
                        v
               Vector Database Index
                        |
                        v
User Query → Query Embedding → Similarity Search
                        |
                        v
             Retrieved Context Chunks
                        |
                        v
                  LLM Generation
                        |
                        v
                  Final Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This architecture enables the system to retrieve information dynamically rather than relying solely on model training.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy49fm6102laxs8huvmqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy49fm6102laxs8huvmqn.png" alt="RAG System Architecture" width="750" height="676"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Ingestion in a RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;The first stage of the pipeline involves gathering the data that the AI system will use as its knowledge source. In many real-world applications, this information is distributed across multiple systems. Organizations may store documentation in internal knowledge bases, PDFs, wikis, product manuals, or database records.&lt;/p&gt;

&lt;p&gt;The ingestion stage extracts textual information from these sources and prepares it for processing. Depending on the data format, ingestion may involve parsing HTML pages, converting PDFs to text, or querying APIs to retrieve structured records.&lt;/p&gt;

&lt;p&gt;At this stage, developers often implement preprocessing steps such as removing redundant formatting, normalizing whitespace, and filtering irrelevant sections. These steps are important because retrieval performance strongly depends on the quality of the text data stored in the system.&lt;/p&gt;

&lt;p&gt;For enterprise knowledge retrieval systems, ingestion pipelines are usually automated and scheduled. For example, an internal documentation chatbot might update its &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-agent-knowledge-bases/" rel="noopener noreferrer"&gt;knowledge base&lt;/a&gt; daily by ingesting the latest documentation changes from a repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text Chunking: Preparing Documents for Retrieval
&lt;/h2&gt;

&lt;p&gt;After ingestion, documents must be divided into smaller pieces before they can be embedded. This step, known as &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/concepts/chunking-strategies/" rel="noopener noreferrer"&gt;text chunking&lt;/a&gt;, plays a critical role in the overall performance of the RAG pipeline.&lt;/p&gt;

&lt;p&gt;Large documents cannot be embedded effectively because embedding models have token limits and because large chunks reduce retrieval precision. Instead, documents are broken into manageable segments that capture a coherent piece of information.&lt;/p&gt;

&lt;p&gt;Chunk size is typically chosen between 200 and 500 tokens. Smaller chunks provide more precise retrieval results, while larger chunks preserve more contextual information. Many production pipelines use overlapping chunks to prevent important sentences from being split across boundaries.&lt;/p&gt;

&lt;p&gt;The following diagram illustrates how a long document is transformed into multiple overlapping chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original Document
-------------------------------------------------------
| Paragraph 1 | Paragraph 2 | Paragraph 3 | Paragraph 4 |
-------------------------------------------------------

After Chunking
-------------------------------------------------------
| Chunk 1 | Chunk 2 | Chunk 3 | Chunk 4 | Chunk 5 |
-------------------------------------------------------

Chunk Example
Chunk 1: Paragraph 1 + part of Paragraph 2
Chunk 2: Paragraph 2 + part of Paragraph 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choosing an effective chunking strategy significantly improves retrieval accuracy because each chunk represents a focused semantic concept.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Generation
&lt;/h2&gt;

&lt;p&gt;Once documents are divided into chunks, each chunk must be converted into a numerical representation called an embedding. Embeddings transform text into high-dimensional vectors that capture semantic meaning.&lt;/p&gt;

&lt;p&gt;For example, two sentences that express similar ideas will produce vectors that are close to each other in vector space. This property allows vector databases to retrieve semantically related text even when the wording differs.&lt;/p&gt;

&lt;p&gt;Embedding models are trained using large datasets and &lt;a href="https://www.digitalocean.com/community/tutorials/transformers-attention-is-all-you-need" rel="noopener noreferrer"&gt;transformer architectures&lt;/a&gt;. When a chunk is processed, the model generates a vector with hundreds or thousands of dimensions. These vectors serve as the foundation for similarity search.&lt;/p&gt;

&lt;p&gt;Embedding generation occurs during both indexing and retrieval. During indexing, embeddings are generated for each document chunk. During retrieval, the user’s query is also converted into an embedding so that it can be compared against stored vectors.&lt;/p&gt;

&lt;p&gt;This mechanism allows the RAG system to perform &lt;strong&gt;semantic search&lt;/strong&gt;, which is far more powerful than traditional keyword matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector Embedding
&lt;/h2&gt;

&lt;p&gt;Vector embeddings are dense numerical representations of data, which can be text, images, or audio. Vector embeddings are used to capture the semantic meaning of the data in a high-dimensional vector space. In an end-to-end RAG pipeline, embeddings are used to convert both documents and user queries into vectors so that similarity between them can be measured using metrics like cosine similarity. This allows the system to retrieve context based on meaning rather than exact keyword matches, making responses more accurate and relevant.&lt;/p&gt;

&lt;p&gt;For example, even if a query doesn’t contain the same words as a document, embeddings can still identify it as relevant if the underlying intent is similar. Popular embedding models used in RAG systems include &lt;a href="https://developers.openai.com/api/docs/models/text-embedding-3-large" rel="noopener noreferrer"&gt;text-embedding-3-large&lt;/a&gt;, &lt;a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" rel="noopener noreferrer"&gt;all-MiniLM-L6-v2&lt;/a&gt;, &lt;a href="https://huggingface.co/BAAI/bge-large-en" rel="noopener noreferrer"&gt;bge-large-en&lt;/a&gt;, and &lt;a href="https://huggingface.co/intfloat/e5-large-v2" rel="noopener noreferrer"&gt;e5-large-v2&lt;/a&gt;, each offering different trade-offs in performance, cost, and deployment flexibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixgailx5konq18wkv1ev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixgailx5konq18wkv1ev.png" alt="Vector Embedding Workflow" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing Vectors in a Database
&lt;/h2&gt;

&lt;p&gt;After embeddings are created, they must be stored in a specialized database capable of performing fast similarity searches. These systems are known as &lt;strong&gt;vector databases&lt;/strong&gt; and form the core of the RAG retrieval infrastructure.&lt;/p&gt;

&lt;p&gt;Unlike traditional databases that index numeric or textual fields, vector databases are optimized to search across high-dimensional vectors. They use approximate nearest neighbor algorithms to identify vectors that are closest to a query embedding.&lt;/p&gt;

&lt;p&gt;The structure of a stored vector typically includes the embedding itself, the original text chunk, and metadata describing the source of the information. Metadata can include document identifiers, timestamps, or categories that allow filtering during retrieval.&lt;/p&gt;

&lt;p&gt;A simplified representation of vector storage looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Vector Database

ID     Vector Embedding        Text Chunk
---------------------------------------------------------
1   [0.12, -0.44, 0.92...]   "RAG combines retrieval..."
2   [0.55, 0.33, -0.14...]   "Vector databases enable..."
3   [-0.77, 0.08, 0.62...]   "Embeddings represent..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Popular vector database technologies include managed services and open-source platforms designed specifically for AI workloads. The choice often depends on scale, infrastructure preferences, and latency requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval in a RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;When a user submits a question, the system begins the retrieval stage. The query is first converted into an embedding using the same embedding model used during indexing. Maintaining the same embedding model is important because similarity comparisons rely on consistent vector representations.&lt;/p&gt;

&lt;p&gt;The query embedding is then sent to the vector database. The database performs a similarity search to find document chunks whose embeddings are closest to the query vector. These chunks represent the pieces of information most relevant to the user’s question.&lt;/p&gt;

&lt;p&gt;The retrieved chunks are then combined and passed to the language model as contextual input. The model uses this context to generate a response grounded in actual documents rather than relying solely on its training data.&lt;/p&gt;

&lt;p&gt;This process ensures that answers are based on real knowledge sources and can be updated whenever the underlying documents change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generation with a Large Language Model
&lt;/h2&gt;

&lt;p&gt;The final stage of the pipeline involves generating a response using a language model. At this point, the system already has two pieces of information: the user’s question and the retrieved context.&lt;/p&gt;

&lt;p&gt;These elements are combined into a prompt that instructs the model to answer the question using the provided information. Because the context is derived from authoritative documents, the model’s output becomes significantly more reliable and factual.&lt;/p&gt;

&lt;p&gt;This stage also allows developers to control how responses are generated. Prompts may instruct the model to summarize information, provide citations, or answer in a specific format. Some systems also include guardrails that prevent hallucinations or restrict responses to retrieved information.&lt;/p&gt;

&lt;p&gt;For example, if a user asks a question, the system first pulls the most relevant text from your knowledge base, then the LLM rewrites that content into a helpful answer, making it more conversational, structured, and easy to understand. This step is what makes RAG powerful, because it combines &lt;strong&gt;accurate, up-to-date information&lt;/strong&gt; with &lt;strong&gt;fluent natural language generation&lt;/strong&gt;, reducing hallucinations and improving answer quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Demo: Building a Simple End-to-End RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;The following example demonstrates how a basic &lt;strong&gt;RAG pipeline for LLM applications&lt;/strong&gt; can be implemented in Python. The example uses document loading, chunking, embeddings, and a vector database to create a minimal working pipeline.&lt;/p&gt;

&lt;h4&gt;
  
  
  Install dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install langchain chromadb sentence-transformers openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Load documents
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.document_loaders import TextLoader

loader = TextLoader("knowledge_base.txt")
documents = loader.load()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Split documents into chunks
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
   chunk_size=500,
   chunk_overlap=100
)

chunks = splitter.split_documents(documents)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Generate embeddings
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
   model_name="sentence-transformers/all-MiniLM-L6-v2"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Store vectors
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.vectorstores import Chroma

vector_db = Chroma.from_documents(
   documents=chunks,
   embedding=embeddings
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Retrieval and generation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()

qa_chain = RetrievalQA.from_chain_type(
   llm=llm,
   retriever=vector_db.as_retriever()
)

response = qa_chain.run(
   "What is retrieval augmented generation?"
)

print(response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simple implementation demonstrates how document retrieval and language models can be combined into a working RAG system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating RAG System Performance
&lt;/h2&gt;

&lt;p&gt;Evaluating a RAG system is important because you need to be sure that it is not only retrieving the right information but also generating correct and useful answers from it. In simple terms, a good RAG pipeline should &lt;strong&gt;find the right content&lt;/strong&gt; and then &lt;strong&gt;explain it correctly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;First, let’s look at &lt;strong&gt;retrieval evaluation&lt;/strong&gt;. This checks whether the system is pulling the right documents from your database. Imagine you have a knowledge base about cloud services, and a user asks, &lt;em&gt;“How can I run AI models on GPUs?”&lt;/em&gt;. If your system retrieves documents about &lt;a href="https://www.digitalocean.com/products/gradient/gpu-droplets" rel="noopener noreferrer"&gt;GPU Droplets&lt;/a&gt; or AI infrastructure, that’s a good sign. But if it returns unrelated content like pricing pages or networking docs, retrieval quality is poor. Metrics like &lt;em&gt;recall&lt;/em&gt; (did we find all relevant documents?) and &lt;em&gt;precision&lt;/em&gt; (were the retrieved documents actually relevant?) help measure this. For example, if 5 documents are relevant but your system only retrieves 2, recall is low.&lt;/p&gt;

&lt;p&gt;Next is &lt;strong&gt;generation evaluation&lt;/strong&gt;, which focuses on the answer produced by the language model. Even if retrieval is correct, the model (like GPT-4 or Llama 3) might still generate incomplete or incorrect responses. For instance, if the retrieved document clearly says &lt;em&gt;“GPU droplets support CUDA workloads”&lt;/em&gt;, but the model responds with &lt;em&gt;“GPU support is limited”&lt;/em&gt;, that’s a problem. This is why human evaluation is often needed to check if the answer is &lt;strong&gt;factually correct, complete, and grounded in the provided context&lt;/strong&gt;. Automated metrics struggle to detect things like s or subtle inaccuracies.&lt;/p&gt;

&lt;p&gt;To make evaluation consistent, teams usually create an &lt;strong&gt;evaluation dataset&lt;/strong&gt;. This is a collection of sample questions along with their correct answers and sometimes the expected source documents. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Question: &lt;em&gt;“What are GPU droplets used for?”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Expected answer: &lt;em&gt;“They are used for AI/ML workloads, training models, and high-performance computing.”&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can then run your RAG system on this dataset and compare its answers against the expected ones. Over time, this helps you track improvements, catch errors, and tune your system (for example, by improving chunking, choosing a better embedding model, or adjusting prompts).&lt;/p&gt;

&lt;p&gt;In practice, strong RAG evaluation combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval checks&lt;/strong&gt;: Did we fetch the right information?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer checks&lt;/strong&gt;: Did we explain it correctly?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous testing&lt;/strong&gt;: Are we improving over time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures your RAG pipeline is reliable, accurate, and ready for real-world use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling and Production Considerations
&lt;/h2&gt;

&lt;p&gt;Prototype RAG pipelines often work well with small datasets, but production deployments introduce additional challenges. Large organizations may store millions of document chunks, requiring scalable infrastructure for indexing and retrieval.&lt;/p&gt;

&lt;p&gt;Latency also becomes an important concern. Vector searches, embedding generation, and LLM inference all contribute to response time. Developers must carefully optimize these components to ensure interactive performance.&lt;/p&gt;

&lt;p&gt;Production systems frequently incorporate caching layers, query batching, and efficient indexing strategies. Monitoring tools are also used to track retrieval accuracy, system latency, and cost per query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost and Latency Optimization
&lt;/h2&gt;

&lt;p&gt;Operating a &lt;a href="https://www.digitalocean.com/community/conceptual-articles/rag-ai-agents-agentic-rag-comparative-analysis" rel="noopener noreferrer"&gt;RAG pipeline&lt;/a&gt; at scale can become expensive if not carefully optimized. Each query may require embedding generation, vector search, and language model inference.&lt;/p&gt;

&lt;p&gt;Several strategies help reduce these costs. Caching responses for frequently asked questions prevents repeated model inference. Limiting the number of retrieved chunks also reduces token usage and speeds up generation.&lt;/p&gt;

&lt;p&gt;Another important technique is &lt;strong&gt;re-ranking&lt;/strong&gt;. Instead of sending many retrieved documents to the language model, a re-ranking model selects the most relevant passages before generation. This improves response quality while reducing computational overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG vs Fine-Tuning
&lt;/h2&gt;

&lt;p&gt;A common question among developers is whether to use retrieval-augmented generation or fine-tuning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/fine-tuning-llms-on-budget-digitalocean-gpu" rel="noopener noreferrer"&gt;Fine-tuning&lt;/a&gt; changes a model’s internal weights by training it on additional datasets. This approach works well for teaching models specific styles or behaviors. However, it is less effective for continuously changing knowledge because retraining the model is expensive and time-consuming.&lt;/p&gt;

&lt;p&gt;RAG systems take a different approach by keeping the model unchanged while retrieving knowledge dynamically. This makes them ideal for applications where information changes frequently, such as product documentation or customer support knowledge bases.&lt;/p&gt;

&lt;p&gt;For most knowledge-intensive applications, RAG provides a more flexible and maintainable solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building an end-to-end RAG pipeline is about combining the strengths of retrieval systems and large language models to create applications that are both accurate and context-aware. Instead of relying only on pre-trained knowledge, a RAG system can fetch relevant information in real time and use models like GPT-4 or Llama 3 to generate clear, human-like responses grounded in that data. In this article, we understood each of the steps used to create the RAG pipeline from data ingestion and chunking to vector embeddings, retrieval, and response generation. Each component plays a critical role, and even small improvements (like better chunking strategies or choosing the right embedding model) can significantly impact overall performance. As organizations continue to build AI-powered applications, RAG stands out as a practical and scalable approach for use cases like chatbots, knowledge assistants, and document search. By continuously evaluating and refining your pipeline, you can create systems that are not only intelligent but also reliable and production-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/resources/articles/rag" rel="noopener noreferrer"&gt;What is Retrieval Augmented Generation (RAG)? The Key to Smarter, More Accurate AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/conceptual-articles/rag-ai-agents-agentic-rag-comparative-analysis" rel="noopener noreferrer"&gt;RAG, AI Agents, and Agentic RAG: An In-Depth Review and Comparative Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/beyond-vectors-knowledge-graphs-and-rag" rel="noopener noreferrer"&gt;Beyond Vectors - Knowledge Graphs &amp;amp; RAG Using Gradient&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/" rel="noopener noreferrer"&gt;Langchain docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>tutorial</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Tutorial: Deploy NVIDIA's NemoClaw in One Click</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Mon, 23 Mar 2026 18:28:14 +0000</pubDate>
      <link>https://forem.com/digitalocean/how-to-set-up-nemoclaw-on-a-digitalocean-droplet-with-1-click-1lo4</link>
      <guid>https://forem.com/digitalocean/how-to-set-up-nemoclaw-on-a-digitalocean-droplet-with-1-click-1lo4</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Amit Jotwani (Staff Developer Advocate at DigitalOcean)&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Takeaways
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;NemoClaw is an open-source stack from NVIDIA designed to help developers run OpenClaw securely. &lt;/li&gt;
&lt;li&gt;DigitalOcean offers NemoClaw 1-Click Droplets that enable you to set up this stack on a CPU-optimized virtual machine and run NemoClaw. &lt;/li&gt;
&lt;li&gt;This tutorial illustrates how to SSH into your Droplet, configure inference settings and policies, connect to NemoClaw, and effectively reconnect after the initial setup.
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;At GTC 2026, NVIDIA announced &lt;a href="https://nvidianews.nvidia.com/news/nvidia-announces-nemoclaw" rel="noopener noreferrer"&gt;NemoClaw&lt;/a&gt;, an open-source stack that makes it easy to run &lt;a href="https://openclaw.com/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; autonomous agents securely. OpenClaw is an open-source agent platform that Jensen Huang called “the operating system for personal AI.” We covered &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-run-openclaw" rel="noopener noreferrer"&gt;how to run OpenClaw on a Droplet&lt;/a&gt; in an earlier tutorial. NemoClaw takes a different approach — it wraps OpenClaw with sandboxing, security policies, and inference routing through NVIDIA’s cloud.&lt;/p&gt;

&lt;p&gt;NemoClaw is still in alpha, so expect rough edges. Interfaces may change, features might be incomplete, and things could break. But if you’re curious to try it out or just want to see what NVIDIA’s vision for agents looks like, this tutorial will get you up and running on a DigitalOcean Droplet in under 10 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin, you’ll need:&lt;/p&gt;

&lt;p&gt;A DigitalOcean account (&lt;a href="https://cloud.digitalocean.com/registrations/new" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt; if you don’t have one)&lt;br&gt;
An NVIDIA account to generate an API key at &lt;a href="https://build.nvidia.com/settings/api-keys" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1 - Create a Droplet from the Marketplace
&lt;/h2&gt;

&lt;p&gt;Head to the NemoClaw 1-Click Droplet on the DigitalOcean Marketplace. Click &lt;strong&gt;Create NemoClaw Droplet&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When configuring the Droplet, select the &lt;strong&gt;CPU-Optimized&lt;/strong&gt; plan with &lt;strong&gt;Premium Intel&lt;/strong&gt;. You’ll want the option with &lt;strong&gt;32 GB of RAM and 16 CPUs&lt;/strong&gt;. NemoClaw runs Docker containers, a Kubernetes cluster (k3s), and the OpenShell gateway, so it needs the headroom.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkf3xcfukamdj8d0kidh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkf3xcfukamdj8d0kidh1.png" alt="Droplet Configuration Settings" width="800" height="691"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pick a data center region near you, add your SSH key, and hit &lt;strong&gt;Create Droplet&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Heads up: This Droplet costs $336/mo, so make sure to destroy it when you’re done experimenting. It adds up fast if you forget about it.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2 - SSH into the Droplet
&lt;/h2&gt;

&lt;p&gt;Once your Droplet is ready, SSH in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ssh"&gt;&lt;code&gt;&lt;span class="k"&gt;ssh&lt;/span&gt; root@your_server_ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see the usual Ubuntu login banner, and then the NemoClaw onboarding wizard will kick off automatically. It runs through a series of preflight checks, making sure Docker is running, installing the OpenShell CLI, and spinning up the gateway. You’ll see checkmarks fly by as each step completes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9zq2u6f7fiedqcrj91w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9zq2u6f7fiedqcrj91w.png" alt="Onboarding checks" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 - Walk Through the Onboard Wizard
&lt;/h2&gt;

&lt;p&gt;The onboarding wizard will ask you a few things. Here’s what to do at each prompt:&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Name
&lt;/h3&gt;

&lt;p&gt;The first prompt asks for a sandbox name. Just press &lt;strong&gt;Enter&lt;/strong&gt; to accept the default (&lt;code&gt;my-assistant&lt;/code&gt;). The wizard will then create the sandbox, build the container image, and push it to the gateway. This takes a couple of minutes, and you’ll see it run through about 20 steps as it builds and uploads everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  NVIDIA API Key
&lt;/h3&gt;

&lt;p&gt;Once the sandbox is ready, the wizard asks for your NVIDIA API key. In this setup, inference is routed through NVIDIA’s cloud using the &lt;code&gt;nvidia/nemotron-3-super-120b-a12b&lt;/code&gt; model, so it needs a key to authenticate.&lt;/p&gt;

&lt;p&gt;To get your key, head to &lt;a href="https://build.nvidia.com/settings/api-keys" rel="noopener noreferrer"&gt;build.nvidia.com/settings/api-keys&lt;/a&gt;, sign in, and click &lt;strong&gt;Generate API Key&lt;/strong&gt;. Give it a name, pick an expiration, and hit &lt;strong&gt;Generate Key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffkfetz0bbqstz3ea9a3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffkfetz0bbqstz3ea9a3.png" alt="NVIDIA API Key generation" width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy the key (it starts with &lt;code&gt;nvapi-&lt;/code&gt;), paste it into the terminal prompt, and press &lt;strong&gt;Enter&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcisdgrdv3g5qk78pn0ti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcisdgrdv3g5qk78pn0ti.png" alt="NVIDIA API key integration" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The wizard saves the key to &lt;code&gt;~/.nemoclaw/credentials.json&lt;/code&gt; and sets up the inference provider. You’ll see it confirm the model and create an inference route.&lt;/p&gt;

&lt;h3&gt;
  
  
  Policy Presets
&lt;/h3&gt;

&lt;p&gt;After the inference setup, NemoClaw sets up OpenClaw inside the sandbox and then asks about policy presets. You’ll see a list of available presets including Discord, Docker Hub, Hugging Face, Jira, npm, PyPI, Slack, and more. These control what external services the agent is allowed to reach.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzr3abqzhmec2dawimv2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzr3abqzhmec2dawimv2.png" alt="Onboarding policy presets" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the bottom, the wizard asks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Apply suggested presets (pypi, npm)? [Y/n/list]:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type &lt;code&gt;n&lt;/code&gt; and press &lt;strong&gt;Enter&lt;/strong&gt;. These presets grant the sandbox network access to package registries, which you don’t need for a basic setup. You can always add them later if your agent needs to install packages.&lt;/p&gt;

&lt;p&gt;Once onboarding finishes, you’ll see a clean summary with your sandbox details and the commands you’ll need going forward:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv3xi2k87w2wyolgqfku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv3xi2k87w2wyolgqfku.png" alt="Onboarding complete" width="800" height="530"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sandbox    my-assistant (Landlock + seccomp + netns)
Model      nvidia/nemotron-3-super-120b-a12b (NVIDIA Cloud API)
NIM        not running

Run:       nemoclaw my-assistant connect
Status:    nemoclaw my-assistant status
Logs:      nemoclaw my-assistant logs --follow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4 - Connect to NemoClaw
&lt;/h2&gt;

&lt;p&gt;Now for the fun part. Connect to your sandbox.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nemoclaw my-assistant connect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This drops you into a shell inside the sandboxed environment. From here, launch the OpenClaw TUI (terminal user interface):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. You should see the OpenClaw chat interface come up. The agent will greet you and introduce itself, ready to chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsc2n1gyftn9k6eibpy34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsc2n1gyftn9k6eibpy34.png" alt="OpenClaw TUI" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Type a message and hit &lt;strong&gt;Enter&lt;/strong&gt;. You’re now talking to an AI agent running inside a secure, sandboxed environment on your own Droplet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reconnecting After a New SSH Session
&lt;/h2&gt;

&lt;p&gt;If you close your terminal and SSH back into the Droplet later, you’ll find that &lt;code&gt;nemoclaw&lt;/code&gt; and related commands aren’t available. That’s because the onboarding script installed everything through nvm in a separate shell, and that doesn’t carry over to new sessions.&lt;/p&gt;

&lt;p&gt;Run this once to fix it permanently. It adds nvm to your &lt;code&gt;.bashrc&lt;/code&gt; so it loads automatically on every login:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'export NVM_DIR="$HOME/.nvm"'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'[ -s "$NVM_DIR/nvm.sh" ] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'[ -s "$NVM_DIR/bash_completion" ] &amp;amp;&amp;amp; \. "$NVM_DIR/bash_completion"'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reconnect to your sandbox and launch the TUI the same way as before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nemoclaw my-assistant connect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7v53w5esybr80ypsbwtt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7v53w5esybr80ypsbwtt.png" alt="Sandbox reload" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything picks up right where you left off. Your sandbox and agent are still running.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;By default, the sandbox has limited network access, so the agent can’t reach external services out of the box. To unlock more capabilities - like connecting to Slack, GitHub, or pulling packages from PyPI - you’ll want to configure policy presets. Check the NemoClaw documentation for the full list of available integrations and how to set them up.&lt;/p&gt;

&lt;p&gt;NemoClaw is still very early, so expect things to be rough around the edges. But if you want to get a feel for where always-on agents are headed, this is a good way to start poking around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://marketplace.digitalocean.com/apps/nemoclaw-alpha" rel="noopener noreferrer"&gt;NemoClaw 1-Click Droplet on DigitalOcean Marketplace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NVIDIA/NemoClaw/" rel="noopener noreferrer"&gt;NemoClaw GitHub Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/nemoclaw/latest/" rel="noopener noreferrer"&gt;NemoClaw Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nvidianews.nvidia.com/news/nvidia-announces-nemoclaw" rel="noopener noreferrer"&gt;NVIDIA NemoClaw Announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openclaw.com/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-run-openclaw" rel="noopener noreferrer"&gt;How to Run OpenClaw on a DigitalOcean Droplet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://build.nvidia.com/settings/api-keys" rel="noopener noreferrer"&gt;NVIDIA API Keys&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>tutorial</category>
      <category>nemoclaw</category>
      <category>ai</category>
      <category>nvidia</category>
    </item>
    <item>
      <title>GPT 5.3 Codex is the Next Level for Agentic Coding</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Thu, 19 Mar 2026 20:00:00 +0000</pubDate>
      <link>https://forem.com/digitalocean/gpt-53-codex-is-the-next-level-for-agentic-coding-52kl</link>
      <guid>https://forem.com/digitalocean/gpt-53-codex-is-the-next-level-for-agentic-coding-52kl</guid>
      <description>&lt;p&gt;Agentic Coding models are one of the obvious and most impressive applications of LLM technologies, and their development has gone hand in hand with massive impacts to markets and job growth. There are numerous players vying to create the best new LLM for all sorts of applications, and many would argue no company and their products in this space have more of a significant impact than OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;GPT‑5.3‑Codex&lt;/a&gt; is a truly impressive installment in this quest to create the best model. &lt;a href="https://openai.com" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; promises that GPT-5.3-Codex is their most &lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;capable Codex model&lt;/a&gt; yet, advancing both coding performance and professional reasoning beyond GPT-5.2-Codex. Benchmark results show state-of-the-art performance on coding and agentic benchmarks like SWE-Bench Pro and Terminal-Bench, reflecting stronger multi-language and real-world task ability. Furthermore, the model is ~25% faster than &lt;a href="https://openai.com/index/introducing-gpt-5-2-codex/" rel="noopener noreferrer"&gt;GPT-5.2-Codex&lt;/a&gt; for &lt;a href="https://openai.com/codex/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt; users thanks to infrastructure and inference improvements. Overall, GPT‑5.3‑Codex might be the most powerful agentic coding model ever released (&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;So let’s see what it can do. Now available on the &lt;a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer"&gt;DigitalOcean GradientTM AI Platform&lt;/a&gt; and all OpenAI ChatGPT and Codex resources, we can test the model to see how it performs. In this tutorial, we will show how to use Codex to write a completely new project from scratch. We are going to make a &lt;a href="https://huggingface.co/Tongyi-MAI/Z-Image-Turbo" rel="noopener noreferrer"&gt;Z-Image-Turbo&lt;/a&gt; Real-Time image-to-image application using GPT‑5.3‑Codex, without any user coding! Follow along to learn what GPT‑5.3‑Codex has to offer, how to use GPT‑5.3‑Codex for yourself, and a guide to vibe coding new web applications from scratch!&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;State-of-the-Art Agentic Performance: GPT-5.3-Codex delivers impressive results across software engineering and agentic tasks, outperforming GPT-5.2-Codex in reasoning, multi-language capability, and real-world coding evaluations like SWE-Bench Pro and Terminal-Bench 2.0.&lt;/li&gt;
&lt;li&gt;Getting Started with GPT-5.3-Codex on GradientTM AI Platform is easy: All you need is access to the DigitalOcean Platform to begin integrating your LLM’s calls seamlessly into your workflows at scale.&lt;/li&gt;
&lt;li&gt;From Prototype to Production in Record Time: With roughly 25% improved speed and real-time interactive steering, GPT-5.3-Codex feels less like a static generator and more like a responsive engineering partner capable of iterating, debugging, and refining projects alongside you. By handling scaffolding, architecture decisions, edge cases, and deployment-ready details, GPT-5.3-Codex can dramatically compress development timelines, making it possible to ship fully functional applications from scratch more quickly than ever (&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GPT‑5.3‑Codex Overview
&lt;/h2&gt;

&lt;p&gt;GPT-5.3-Codex is a major agentic coding model upgrade that combines stronger reasoning and professional knowledge with enhanced coding performance, runs about 25 % faster than GPT-5.2-Codex, and excels on real-world and multi-language benchmarks like &lt;a href="https://scale.com/leaderboard/swe_bench_pro_public" rel="noopener noreferrer"&gt;SWE-Bench Pro&lt;/a&gt; and &lt;a href="https://www.tbench.ai/" rel="noopener noreferrer"&gt;Terminal-Bench&lt;/a&gt;. It’s designed to go beyond simple code generation to support full software lifecycle tasks (e.g., debugging, deployment, documentation) and lets you interact and steer it in real time while it’s working, making it feel more like a collaborative partner than a generator. It also has expanded capabilities for long-running work and improved responsiveness, with broader availability across IDEs, CLI, and apps for paid plans. (&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6s3njnozmwe93mtdvfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6s3njnozmwe93mtdvfg.png" alt="image" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see from the table above, GPT‑5.3‑Codex is a major step forward over GPT‑5.2‑Codex across software engineering, agentic, and computer use benchmarks. This, paired with the marked improvement in efficiency, make for an incredible indicator of how great this model is. We think this is a significant upgrade to previous GPT Codex model users, as well as new users looking for a powerful agentic coding tool to aid their process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with GPT-5.3-Codex
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh22frckrami4z84ep59l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh22frckrami4z84ep59l.png" alt="image" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are two ways to get started with GPT-5.3-Codex that we recommend to developers. First, is accessing the model with Serverless Inference through the &lt;a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer"&gt;GradientTM AI Platform&lt;/a&gt;. With Serverless Inference, we can Pythonically integrate the LLM generations into any pipeline. All you need to do is create a model access key, and begin generating! For more information on getting started, check out the official &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffurv5tcadtlwz8jloy21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffurv5tcadtlwz8jloy21.png" alt="image" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The other way to get started quickly is the official OpenAI Codex application. It’s easy to get started with Codex on your local machine. Simply download the application onto your computer, and launch it. You will then be prompted to log in to your account. From there, simply choose which project you wish to work in, and you’re ready to get started!&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe Coding a Z-Image-Turbo Web Application with GPT‑5.3‑Codex
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevd2jw8py8w20fzi25x1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevd2jw8py8w20fzi25x1.gif" alt="image" width="560" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So now that we have heard about how GPT‑5.3‑Codex performs, let’s see it in action. For this experiment, we sought to see how the model performed on a relatively novel assignment that has a basis in past applications. In this case, we asked it to create a real-time image-to-image pipeline for Z-Image-Turbo that uses webcam footage as image input.&lt;/p&gt;

&lt;p&gt;To do this, we created a blank new directory/project space to work in. We then asked the model to create a skeleton of the project to begin, and then iteratively added in the missing features on subsequent queries. Overall, we were able to create a full working version of the application with just 5 prompts and 30 minutes of testing. This extreme speed made it possible to ship the project in less than a day, from inspiration to completion. Now let’s take a closer look at the application project itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau60yz6xtsq15q936e6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau60yz6xtsq15q936e6e.png" alt="image" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This project, which can be found &lt;a href="https://github.com/Jameshskelton/z-image-turbo-realtime" rel="noopener noreferrer"&gt;here&lt;/a&gt;, is a real-time webcam-driven image-to-image generation application built in Python around a &lt;a href="https://www.gradio.app/" rel="noopener noreferrer"&gt;Gradio&lt;/a&gt; interface and a dedicated Z-Image-Turbo inference engine, where the UI in app.py presents side-by-side live input and generated output panes, parameter controls, and explicit Start/Stop gating so inference only runs when requested, while the backend in inference.py loads Tongyi-MAI/Z-Image-Turbo via ZImageImg2ImgPipeline, introspects the pipeline signature to bind the correct image-conditioning argument, enforces true img2img semantics instead of prompt-only generation, and executes inference in torch.inference_mode() with dynamic argument wiring so behavior adapts to the installed diffusers API. Critically, it can compute per-frame target resolution from webcam aspect ratio, snapping dimensions to a model-friendly multiple (default 16), and caps both sides below 1024, then applies post-generation safeguards that made the app stable in practice: dtype strategy (auto preferring bf16 then fp32, avoiding fp16 black-frame failure modes), degenerate-output detection with automatic float32 recovery, robust PIL/NumPy/Tensor output decoding and normalization, effective-strength clamping to preserve source structure, frame-hash seed mixing so scene changes influence results, and configurable structure-preserving input blending, all parameterized in config.py and documented in the &lt;a href="https://github.com/Jameshskelton/z-image-turbo-realtime?tab=readme-ov-file#readme" rel="noopener noreferrer"&gt;README.md&lt;/a&gt;, with runtime status reporting latency plus internal diagnostics (pipe, dtype, size, effective strength, blend, seed, warnings) so you can observe exactly how each frame is being processed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;GPT-5.3-Codex feels less like an incremental update and more like a meaningful shift in how developers interact with code. The combination of stronger reasoning,  benchmark gains seen in testing, and a noticeable speed improvement makes it clear that agentic coding is maturing into something even more production-ready. What once required hours of boilerplate, debugging, and manual wiring can now be orchestrated through iterative prompts and high-level direction. As we demonstrated with the Z-Image-Turbo real-time application, a fully functional project can move from blank directory to working prototype in much less  time traditionally required. While the actual results and performance benefits you experience will vary based on specific project requirements, complexity, and individual developer workflows, we are confident that GPT-5.3-Codex provides a substantial upgrade and a meaningful step forward in agentic coding capability, as evidenced by its stronger reasoning and measurable benchmark gains.&lt;/p&gt;

&lt;p&gt;We recommend trying out GPT-5.3-Codex in all contexts, especially with &lt;a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer"&gt;DigitalOcean’s GradientTM AI Platform&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>chatgpt</category>
      <category>coding</category>
      <category>tutorial</category>
      <category>codex</category>
    </item>
    <item>
      <title>Getting Started with Qwen3.5 Vision-Language Models</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Tue, 17 Mar 2026 16:00:00 +0000</pubDate>
      <link>https://forem.com/digitalocean/getting-started-with-qwen35-vision-language-models-3ej3</link>
      <guid>https://forem.com/digitalocean/getting-started-with-qwen35-vision-language-models-3ej3</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by James Skelton (Senior AI/ML Technical Content Strategist II)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/visualizing-vision-language-models-multimodal-reasoning" rel="noopener noreferrer"&gt;Vision Language models&lt;/a&gt; are one of the most powerful and highest potential applications of deep learning technologies. The reasoning behind such a strong assertion lies in the versatility of VL modeling: from document understanding to object tracking to image captioning, vision language models are likely going to be the building blocks of the incipient, physical AI future. This is because everything that we can interact with that will be powered by AI - from robots to driverless vehicles to medical assistants - will likely have a VL model in its pipeline.&lt;/p&gt;

&lt;p&gt;This is why the power of open-source development is so important to all of these disciplines and applications of AI, and why we are so excited about the release of &lt;a href="https://qwen.ai/blog?id=qwen3.5" rel="noopener noreferrer"&gt;Qwen3.5&lt;/a&gt; from Qwen Team. This &lt;a href="https://huggingface.co/collections/Qwen/qwen35" rel="noopener noreferrer"&gt;suite of completely open source VL models&lt;/a&gt;, ranging in size from .8B to 397B (with activated 17B) parameters, is the clear next step forward for VL modeling. They excel at bench marks for everything from agentic coding to computer use to document understanding, and nearly match closed source rivals in terms of capabilities.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will examine and show how to make the best use of Qwen3.5 using a &lt;a href="https://www.digitalocean.com/products/gradient/gpu-droplets" rel="noopener noreferrer"&gt;Gradienttm GPU Droplet&lt;/a&gt;. Follow along for explicit instructions on how to setup and run your GPU Droplet to power Qwen3.5 to power applications like Claude Code and Codex using your own resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Qwen3.5 VL demonstrates the growing power of open &lt;a href="https://www.digitalocean.com/solutions/multimodal-ai" rel="noopener noreferrer"&gt;multimodal AI&lt;/a&gt;. The fully open-source model suite spans from 0.8B to 397B parameters and achieves strong benchmark performance across tasks like coding, document understanding, and computer interaction, approaching the capabilities of leading proprietary models.&lt;/li&gt;
&lt;li&gt;Its architecture enables efficient large-scale multimodal training. By decoupling vision and language parallelism strategies, using sparse activations, and employing an FP8 training pipeline, Qwen3.5 improves hardware utilization, reduces memory usage, and maintains high throughput even when training on mixed text, image, and video data.&lt;/li&gt;
&lt;li&gt;Developers can deploy Qwen3.5 on their own infrastructure. With tools like Ollama and GPU Droplets, it is possible to run large Qwen3.5 models locally or in the cloud to power applications such as coding assistants, computer-use agents, and custom AI tools without relying on proprietary APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Qwen3.5: Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3v5lob56ux6d9h1yzny.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3v5lob56ux6d9h1yzny.jpg" alt="image" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Qwen3.5 is a fascinating model suite with a unique architecture. It “enables efficient native multimodal training via a heterogeneous infrastructure that decouples parallelism strategies across vision and language components” (&lt;a href="https://qwen.ai/blog?id=qwen3.5" rel="noopener noreferrer"&gt;Source&lt;/a&gt;). This helps to make it avoid uniform approaches’ inefficiencies, such as over-allocating compute to lighter modalities, synchronization bottlenecks between vision and language towers, memory imbalance across devices, and reduced scaling efficiency when both modalities are forced into the same parallelism strategy.&lt;/p&gt;

&lt;p&gt;By leveraging sparse activations to enable overlapping computation across model components, the system reaches nearly the same training throughput as pure text-only baselines even when trained on mixed text, image, and video datasets. Alongside this, a native FP8 training pipeline applies low-precision computation to activations, Mixture-of-Experts (MoE) routing, and GEMM operations. Runtime monitoring dynamically preserves BF16 precision in numerically sensitive layers, reducing activation memory usage by roughly 50% and delivering more than a 10% training speed improvement while maintaining stable scaling to tens of trillions of tokens.&lt;/p&gt;

&lt;p&gt;To further leverage reinforcement learning at scale, the team developed an asynchronous RL framework capable of training Qwen3.5 models across all sizes, supporting text-only, multimodal, and multi-turn interaction settings. The system uses a fully disaggregated &lt;a href="https://www.digitalocean.com/community/tutorials/llm-inference-optimization" rel="noopener noreferrer"&gt;training–inference architecture&lt;/a&gt;, allowing training and rollout generation to run independently while improving hardware utilization, enabling dynamic load balancing, and supporting fine-grained fault recovery. Through techniques such as end-to-end FP8 training, rollout router replay, speculative decoding, and multi-turn rollout locking, the framework increases throughput while maintaining strong consistency between training and inference behavior.&lt;/p&gt;

&lt;p&gt;This system–algorithm co-design also constrains gradient staleness and reduces data skew during asynchronous updates, preserving both training stability and model performance. In addition, the framework is built to support agentic workflows natively, enabling uninterrupted multi-turn interactions within complex environments. Its decoupled architecture can scale to millions of concurrent agent scaffolds and environments, which helps improve generalization during training. Together, these optimizations produce a 3×–5× improvement in end-to-end training speed while maintaining strong stability, efficiency, and scalability (&lt;a href="https://qwen.ai/blog?id=qwen3.5" rel="noopener noreferrer"&gt;Source&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Qwen3.5 Demo
&lt;/h2&gt;

&lt;p&gt;Getting started with Qwen3.5 is very simple. Thanks to the foresight of Qwen Team &amp;amp; their collaborators, their are numerous ways to access and run the Qwen3.5 model suite’s models from your own machine. Of course, running the larger models will require significantly more computational resources. We recommend at least an 8x &lt;a href="https://www.digitalocean.com/community/tutorials/nvidia-h200-gpu-droplet" rel="noopener noreferrer"&gt;NVIDIA H200&lt;/a&gt; setup for the larger models in particular, though a single H200 is sufficient for this tutorial. We are going to use Ollama to power &lt;a href="https://huggingface.co/Qwen/Qwen3.5-122B-A10B" rel="noopener noreferrer"&gt;Qwen3.5-122B-A10B&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To get started, simply start up a GPU Droplet with an NVIDIA H200 with your &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server" rel="noopener noreferrer"&gt;SSH key&lt;/a&gt; attached, and SSH in using the terminal on your local machine. From there, navigate to the base directory of your choice. Create a new directory with &lt;code&gt;mkdir&lt;/code&gt; to represent your new workspace, and change into the directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a custom game with Qwen3.5 running on Ollama and Claude Code
&lt;/h3&gt;

&lt;p&gt;For this demo, we are going to do something simple: create a Python based video game for one of the most popular Winter Olympics sports: curling. To get started, paste the following code into the remote terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
ollama launch claude &lt;span class="nt"&gt;--model&lt;/span&gt; qwen3.5:122b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop1la5cjyv0riseeoleb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop1la5cjyv0riseeoleb.png" alt="image" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will launch Claude Code. If everything worked, it should look like above. From here, we can begin giving instructions to our model to begin generating code!&lt;/p&gt;

&lt;p&gt;For this demo, provide it with a base set of instructions. Try customizing the following input:&lt;/p&gt;

&lt;p&gt;“I want to create a simple game of curling in python code. i want it to be playable on my computer. Please create a sample Python program.&lt;/p&gt;

&lt;p&gt;Packages: pygame”&lt;/p&gt;

&lt;p&gt;This will give you, if your model ran predictably, a python file named something like “curling_game.py” with a full game’s code inside. Simply download this file onto your local computer, open the terminal and run it with &lt;code&gt;python3.11 curling_game.py&lt;/code&gt;. Our game looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5yrbeeqys9timusj8qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5yrbeeqys9timusj8qd.png" alt="image" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But looks are deceiving: this game is far from playable in the one-shot state. It requires serious work to amend the code to make the game playable, especially for two players. We can either use Claude Code with Qwen3.5 to make those adjustments, switch to an Anthropic Model like &lt;a href="https://www.digitalocean.com/community/tutorials/claude-sonnet" rel="noopener noreferrer"&gt;Sonnet 4.6&lt;/a&gt; or &lt;a href="https://www.digitalocean.com/community/tutorials/claude-opus" rel="noopener noreferrer"&gt;Opus 4.6&lt;/a&gt;, or make the changes manually. From this base state, it took Qwen3.5 over an hour and at least 10 requests to make the game playable. Time was notably constrained by the single H200 GPU deployment we used for this demo, but the code output leaves significant room for improvement nonetheless. We expect that Opus 4.6 could accomplish the same task in a much quicker time frame, given its optimization for &lt;a href="https://www.digitalocean.com/community/tutorials/claude-code-gpu-droplets-vscode" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, relatively superior benchmark scores, and more optimized infrastructure for inference.&lt;/p&gt;

&lt;p&gt;If you want to try it out, this file can be found on Github &lt;a href="https://gist.github.com/Jameshskelton/02be269e8d50f724cc910b35f6296e9c" rel="noopener noreferrer"&gt;Gist&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Qwen3.5 VL represents an important step forward for open-source multimodal AI, demonstrating that publicly available models can increasingly rival proprietary systems in capability while offering far greater flexibility for developers. With its scalable architecture, efficient training infrastructure, and strong performance across tasks like coding, document understanding, and computer use, the Qwen3.5 suite highlights the growing maturity of the open AI ecosystem. As tools like GPU Droplets and frameworks such as Ollama make deploying large models easier than ever, vision-language systems like Qwen3.5 are poised to become foundational components in the next generation of AI-powered applications and physical AI systems.&lt;/p&gt;

</description>
      <category>qwen</category>
      <category>learning</category>
      <category>aimodels</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>7 OpenClaw Security Challenges to Watch for in 2026</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Thu, 12 Mar 2026 16:00:00 +0000</pubDate>
      <link>https://forem.com/digitalocean/7-openclaw-security-challenges-to-watch-for-in-2026-46b1</link>
      <guid>https://forem.com/digitalocean/7-openclaw-security-challenges-to-watch-for-in-2026-46b1</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Fadeke Adegbuyi (Manager, Content Marketing)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw isn’t just another chatbot wrapper. It executes shell commands, controls your browser, manages your calendar, reads and writes files, and remembers everything across sessions. The &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;project&lt;/a&gt; runs locally on your machine and connects to WhatsApp, Telegram, iMessage, Discord, Slack, and over a dozen other platforms via &lt;a href="https://openclaw.ai/integrations" rel="noopener noreferrer"&gt;pre-built integrations&lt;/a&gt;. It functions as a truly connected personal assistant. As a result, the use cases people have dreamed up for OpenClaw are wild.&lt;/p&gt;

&lt;p&gt;One user showed an OpenClaw agent &lt;a href="https://x.com/xmayeth/status/2020883912734425389" rel="noopener noreferrer"&gt;making money on Polymarket&lt;/a&gt; by monitoring news feeds and executing trades automatically. Another gave their bot access to &lt;a href="https://x.com/MatznerJon/status/2019044317621567811" rel="noopener noreferrer"&gt;home surveillance cameras&lt;/a&gt;. Someone else &lt;a href="https://x.com/nickvasiles/status/2021391007800328683" rel="noopener noreferrer"&gt;&lt;/a&gt;unleashed subagents to apply for &lt;a href="https://x.com/nickvasiles/status/2021391007800328683" rel="noopener noreferrer"&gt;UpWork freelancing jobs&lt;/a&gt; on their behalf.&lt;/p&gt;

&lt;p&gt;

&lt;iframe class="tweet-embed" id="tweet-2019044317621567811-81" src="https://platform.twitter.com/embed/Tweet.html?id=2019044317621567811"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-2019044317621567811-81');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2019044317621567811&amp;amp;theme=dark"
  }





&lt;/p&gt;

&lt;p&gt;But this kind of access to your digital life comes with real consequences when things go wrong. And things have gone wrong. Security researchers found that &lt;a href="https://www.404media.co/silicon-valleys-favorite-new-ai-agent-has-serious-security-flaws/" rel="noopener noreferrer"&gt;&lt;/a&gt;the agent shipped with &lt;a href="https://www.404media.co/silicon-valleys-favorite-new-ai-agent-has-serious-security-flaws/" rel="noopener noreferrer"&gt;serious flaws&lt;/a&gt; that made it possible for attackers to hijack machines with a single malicious link. Meanwhile, &lt;a href="https://www.digitalocean.com/resources/articles/what-is-moltbook" rel="noopener noreferrer"&gt;Moltbook&lt;/a&gt;, a Reddit-style platform with over 2.8 million AI agents, had its database completely &lt;a href="https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/" rel="noopener noreferrer"&gt;exposed&lt;/a&gt;, so anyone could take control of any AI agent on the platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of this means you should avoid OpenClaw entirely&lt;/strong&gt;. It means you should understand OpenClaw security challenges and take precautions before spinning up an agent with root access to your laptop. Running OpenClaw in an isolated cloud environment can help  neutralize some of these risks—DigitalOcean's &lt;a href="https://www.digitalocean.com/blog/moltbot-on-digitalocean" rel="noopener noreferrer"&gt;1-Click Deploy for OpenClaw&lt;/a&gt;, for example, handles authentication, firewall rules, and container isolation out of the box so your personal machine stays out of the equation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are OpenClaw security challenges?
&lt;/h2&gt;

&lt;p&gt;OpenClaw security challenges boil down to a design tension: the tool needs broad system permissions to be useful, but those permissions create a massive attack surface when something goes wrong. The agent runs with whatever privileges your user account has—full disk, terminal, and network access—by design.&lt;/p&gt;

&lt;p&gt;It's also &lt;a href="https://www.digitalocean.com/resources/articles/agentic-ai" rel="noopener noreferrer"&gt;agentic&lt;/a&gt; and self-improving, meaning it can modify its own behavior, update its memory, and install new skills autonomously. This is impressive from a capability standpoint, but another vector that can cause things to spiral when guardrails are missing. Pair that with defaults that skip authentication, an unvetted skill marketplace, and persistent memory storing weeks of context, and trouble follows. The takeaway: approach with caution, isolate from production systems, and carefully scrutinize the defaults.&lt;/p&gt;

&lt;p&gt;To his credit, OpenClaw creator &lt;a href="https://x.com/steipete" rel="noopener noreferrer"&gt;Peter Steinberger&lt;/a&gt; has been openly vocal about these risks and actively encourages running OpenClaw in a &lt;a href="https://docs.openclaw.ai/gateway/sandboxing" rel="noopener noreferrer"&gt;sandboxed environment&lt;/a&gt;, which isolates tool execution inside Docker containers to limit filesystem and process access when the model misbehaves. DigitalOcean's one-click deployment does exactly this out of the box, giving you that isolation without the manual setup.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/n2MrUtIT1m4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  7 OpenClaw security challenges to watch out for
&lt;/h2&gt;

&lt;p&gt;We've already seen a security audit &lt;a href="https://www.kaspersky.com/blog/openclaw-vulnerabilities-exposed/55263/" rel="noopener noreferrer"&gt;uncover 512 vulnerabilities&lt;/a&gt; (eight critical) and &lt;a href="https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html" rel="noopener noreferrer"&gt;malicious ClawHub skills&lt;/a&gt; stealing cryptocurrency wallets. None of these challenges are theoretical. They're all based on incidents that have already played out within weeks of OpenClaw’s launch.&lt;/p&gt;

&lt;p&gt;These are the challenges you need to have on your radar if you're experimenting with OpenClaw:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. One-click remote code execution through WebSocket hijacking
&lt;/h3&gt;

&lt;p&gt;One of the most alarming OpenClaw vulnerabilities discovered so far is &lt;a href="https://thehackernews.com/2026/02/openclaw-bug-enables-one-click-remote.html" rel="noopener noreferrer"&gt;CVE-2026-25253&lt;/a&gt;, a one-click remote code execution flaw that Mav Levin, a founding researcher at DepthFirst, disclosed in late January 2026. The attack worked because OpenClaw's local server didn’t validate the WebSocket origin header—so any website you visited could silently connect to your running agent. An attacker just needed you to click one link. From there, they chained a cross-site WebSocket hijack into full code execution on your machine. The compromise happened in milliseconds. This is the core danger of running an agent locally on the same machine you're browsing the web with—one careless click and an attacker is already inside.&lt;/p&gt;

&lt;p&gt;Levin's proof-of-concept showed that visiting a single malicious webpage was enough to steal authentication tokens and gain operator-level access to the gateway API—giving an attacker access to change your config, read your files, and run commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security checks&lt;/strong&gt;: In this instance, the fix landed in &lt;a href="https://github.com/openclaw/openclaw/releases" rel="noopener noreferrer"&gt;version 2026.1.29&lt;/a&gt;, so update immediately if you’re a version behind. Beyond that, best practices include avoiding running OpenClaw while browsing untrusted sites and considering putting the agent behind a reverse proxy with proper origin validation for an additional layer of protection.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tens of thousands of unprotected OpenClaw instances sitting open on the internet
&lt;/h3&gt;

&lt;p&gt;Here's the thing about OpenClaw's early defaults: the agent trusted any connection from localhost without asking for a password. That sounded fine until the gateway sits behind a misconfigured reverse proxy—at which point every external request got forwarded to 127.0.0.1, and your agent thought the whole internet was a trusted local user. SecurityScorecard's STRIKE team &lt;a href="https://www.bitsight.com/blog/openclaw-ai-security-risks-exposed-instances" rel="noopener noreferrer"&gt;&lt;/a&gt;found over &lt;a href="https://www.bitsight.com/blog/openclaw-ai-security-risks-exposed-instances" rel="noopener noreferrer"&gt;30,000 internet-exposed OpenClaw instances&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Security researcher &lt;a href="https://x.com/theonejvo/status/2015401219746128322" rel="noopener noreferrer"&gt;Jamieson O'Reilly showed&lt;/a&gt; just how bad this gets. He accessed Anthropic API keys, Telegram bot tokens, Slack accounts, and complete chat histories from exposed instances, even sending messages on behalf of users and running commands with full admin privileges. No authentication required.&lt;/p&gt;

&lt;p&gt;This has since been addressed—&lt;a href="https://docs.openclaw.ai/gateway#runtime-model" rel="noopener noreferrer"&gt;gateway auth&lt;/a&gt; is now required by default, and the onboarding wizard auto-generates a token even for localhost.&lt;/p&gt;

&lt;p&gt;

&lt;iframe class="tweet-embed" id="tweet-2015401219746128322-801" src="https://platform.twitter.com/embed/Tweet.html?id=2015401219746128322"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-2015401219746128322-801');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2015401219746128322&amp;amp;theme=dark"
  }





&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security checks&lt;/strong&gt;: At a minimum, check whether your instance is reachable from the public internet. Use a &lt;a href="https://www.digitalocean.com/resources/articles/cloud-firewall" rel="noopener noreferrer"&gt;firewall&lt;/a&gt; to restrict access, enable gateway token authentication, and never expose the control plane without a &lt;a href="https://www.digitalocean.com/solutions/vpn" rel="noopener noreferrer"&gt;VPN&lt;/a&gt; or &lt;a href="https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys" rel="noopener noreferrer"&gt;SSH tunnel&lt;/a&gt; in front of it. This is a  case where a managed cloud deployment can solve the problem outright—because your personal API keys, chat histories, and credentials aren’t sitting on an exposed local machine in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Malicious skills on ClawHub are poisoning the supply chain
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/clawhub" rel="noopener noreferrer"&gt;ClawHub&lt;/a&gt;, OpenClaw's public skill marketplace, lets anyone publish an extension—the only requirement is a GitHub account older than one week. That low bar has unfortunately turned the marketplace into a target. Koi Security &lt;a href="https://www.koi.ai/blog/clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting" rel="noopener noreferrer"&gt;audited all 2,857 skills on ClawHub&lt;/a&gt; and found 341 that were outright malicious. Bitdefender's independent scan put the number closer to &lt;a href="https://www.bitdefender.com/en-us/blog/businessinsights/technical-advisory-openclaw-exploitation-enterprise-networks" rel="noopener noreferrer"&gt;900 malicious skills&lt;/a&gt;, roughly 20% of all packages. A single account—"hightower6eu"—uploaded 354 malicious packages by itself.&lt;/p&gt;

&lt;p&gt;The attack is clever. You install what looks like a useful skill and the documentation looks professional. But buried in a "Prerequisites" section, it asks you to install something first—and that something is Atomic Stealer (&lt;a href="https://www.darktrace.com/blog/atomic-stealer-darktraces-investigation-of-a-growing-macos-threat" rel="noopener noreferrer"&gt;AMOS&lt;/a&gt;), a macOS credential-stealing malware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security checks&lt;/strong&gt;: OpenClaw has since &lt;a href="https://openclaw.ai/blog/virustotal-partnership" rel="noopener noreferrer"&gt;partnered with VirusTotal&lt;/a&gt; to scan new skill uploads, but Steinberger himself admitted this isn't a silver bullet. At a minimum, before installing any skill, read its source code. Check the publisher's account age and history. Put simply, treat every skill as untrusted code running with your agent's full permissions. Unlike some exposure risks, malicious skills are a threat regardless of where OpenClaw runs—a poisoned skill executes the same way on a cloud server as it does on your laptop.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Credential storage in plaintext and API key leakage
&lt;/h3&gt;

&lt;p&gt;One of the less glamorous but more dangerous issues is how OpenClaw handles secrets. The platform &lt;a href="https://permiso.io/blog/inside-the-openclaw-ecosystem-ai-agents-with-privileged-credentials" rel="noopener noreferrer"&gt;stores credentials in plaintext&lt;/a&gt;—including API keys for your LLM provider and tokens for every messaging platform your agent connects to—and those become targets the moment your instance is accessible to anyone other than you. Prompt injection attacks can also trick the agent into exfiltrating credentials by embedding hidden instructions in content the agent processes.&lt;/p&gt;

&lt;p&gt;Cisco's team tested a skill called &lt;a href="https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare" rel="noopener noreferrer"&gt;"What Would Elon Do?"&lt;/a&gt; and surfaced nine security findings, two of them critical. The skill instructed the bot to execute a curl command sending data to an external server controlled by the skill's author. Functionally, it was malware hiding behind a joke name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security check&lt;/strong&gt;: At a minimum, rotate your API keys regularly and store secrets using environment variables or a dedicated secrets manager rather than config files. It's also worth setting spending limits on your LLM provider accounts. That way, even if a key is compromised, it can't rack up thousands in charges.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Prompt injection attacks amplified by persistent memory
&lt;/h3&gt;

&lt;p&gt;What makes prompt injection in OpenClaw worse than in a typical &lt;a href="https://www.digitalocean.com/resources/articles/ai-agent-vs-ai-chatbot" rel="noopener noreferrer"&gt;chatbot&lt;/a&gt; is the persistent memory. The agent retains long-term context, preferences, and conversation history across sessions—which is one of its best features. But it also means a malicious instruction embedded in a website, email, or document doesn't have to execute immediately. Palo Alto Networks warned that these become "&lt;a href="https://www.paloaltonetworks.com/blog/network-security/why-moltbot-may-signal-ai-crisis/" rel="noopener noreferrer"&gt;stateful, delayed-execution attacks&lt;/a&gt;". A hidden prompt in a PDF you opened last Tuesday could sit dormant in the agent's memory until a future task triggers it days later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security check&lt;/strong&gt;: There's no perfect fix for prompt injection right now; it's an unresolved problem in agentic AI. But you can reduce the blast radius by limiting what tools and permissions your agent has access to, segmenting its access to sensitive systems, and reviewing its memory and context periodically for anything unexpected.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Shadow AI spreading through enterprise networks
&lt;/h3&gt;

&lt;p&gt;This one's for anyone working at a company where developers tinker on their work machines. Token Security found that &lt;a href="https://www.token.security/blog/the-clawdbot-enterprise-ai-risk-one-in-five-have-it-installed" rel="noopener noreferrer"&gt;22% of their enterprise customers&lt;/a&gt; have employees running OpenClaw as shadow AI without IT approval. Bitdefender confirmed the same, showing &lt;a href="https://businessinsights.bitdefender.com/technical-advisory-openclaw-exploitation-enterprise-networks" rel="noopener noreferrer"&gt;employees deploying agents&lt;/a&gt; on corporate machines connected to internal networks. An OpenClaw agent on a developer's laptop with VPN access to production means every vulnerability above is now a business problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security check&lt;/strong&gt;: If you're on a security team, you should scan your network for OpenClaw instances now. Set up detection for its WebSocket traffic patterns, and mandate that any approved use runs in an isolated environment—a VM or cloud server—rather than on laptops with internal access. Giving teams an approved, isolated deployment path is the fastest way to get ahead of shadow AI—it's much easier to enforce guardrails when the alternative isn't 'don't use it at all.'&lt;/p&gt;

&lt;h3&gt;
  
  
  7. The Moltbook database breach exposing millions of agent credentials
&lt;/h3&gt;

&lt;p&gt;The security mess isn't limited to OpenClaw itself. Moltbook, the social network for AI agents built by &lt;a href="https://x.com/MattPRD" rel="noopener noreferrer"&gt;Matt Schlicht&lt;/a&gt;, &lt;a href="https://www.404media.co/exposed-moltbook-database-let-anyone-take-control-of-any-ai-agent-on-the-site/" rel="noopener noreferrer"&gt;suffered a database exposure&lt;/a&gt; that cybersecurity firm Wiz discovered in early February. The database had zero access controls. Anyone who found it could view 1.5 million API tokens, 35,000 email addresses, and private messages between agents—enough to take control of any agent on the platform. China's Ministry of Industry and Information Technology &lt;a href="https://www.reuters.com/world/china/china-warns-security-risks-linked-openclaw-open-source-ai-agent-2026-02-05/" rel="noopener noreferrer"&gt;issued a formal warning&lt;/a&gt; about OpenClaw security risks, citing incidents like this breach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security check&lt;/strong&gt;: If you've used Moltbook, rotate every API key and token associated with your agent. Treat third-party platforms in the OpenClaw ecosystem with the same skepticism you'd apply to any new service asking for your credentials and consider additional security checks.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Any references to third-party companies, trademarks, or logos in this document are for informational purposes only and do not imply any affiliation with, sponsorship by, or endorsement of those third parties.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Pricing and product information accurate as of February 2026.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>security</category>
      <category>learning</category>
    </item>
    <item>
      <title>GPU Programming for Beginners: ROCm + AMD Setup to Edge Detection</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Tue, 10 Mar 2026 16:00:00 +0000</pubDate>
      <link>https://forem.com/digitalocean/gpu-programming-for-beginners-rocm-amd-setup-to-edge-detection-29bm</link>
      <guid>https://forem.com/digitalocean/gpu-programming-for-beginners-rocm-amd-setup-to-edge-detection-29bm</guid>
      <description>&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/TdHexc0Garg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;In this hands-on tutorial, we demystify GPU computation and show you how to write your own GPU programs from scratch. Understanding GPU programming is essential for anyone looking to grasp why AI models depend on this specialized hardware.&lt;/p&gt;

&lt;p&gt;We'll use ROCm and HIP (AMD's version of CUDA) to take you from zero to running real GPU code, culminating in a computer vision edge detector that processes images in parallel.&lt;/p&gt;

&lt;p&gt;You can find the code in the &lt;strong&gt;project repository&lt;/strong&gt;: &lt;a href="https://github.com/oconnoob/intro_to_rocm_hip/blob/main/README.md" rel="noopener noreferrer"&gt;https://github.com/oconnoob/intro_to_rocm_hip/blob/main/README.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👇 WHAT YOU'LL LEARN IN THIS VIDEO 👇&lt;/p&gt;

&lt;p&gt;🔧 &lt;strong&gt;Getting Set Up with ROCm Two ways to get started&lt;/strong&gt;: spin up a GPU Droplet on DigitalOcean with ROCm pre-installed, or install ROCm yourself on an Ubuntu system with an AMD GPU. We cover both methods step-by-step.&lt;/p&gt;

&lt;p&gt;➕ &lt;strong&gt;Example 1&lt;/strong&gt;: Vector Addition (The Basics) Learn the fundamental structure of GPU programs—kernels, threads, blocks, and memory management. We'll add one million elements in parallel and verify our results.&lt;/p&gt;

&lt;p&gt;⚡ &lt;strong&gt;Example 2&lt;/strong&gt;: Matrix Multiplication (Why Libraries Matter) Discover why optimized libraries like rocBLAS dramatically outperform naive implementations. This is the operation powering most AI models you use daily.&lt;/p&gt;

&lt;p&gt;👁️ &lt;strong&gt;Example 3&lt;/strong&gt;: Edge Detection with Sobel Filter (The Cool Stuff) Apply your GPU programming skills to a real computer vision problem—detecting edges in images using a classic Sobel filter, all running massively parallel on the GPU.&lt;/p&gt;

&lt;p&gt;Whether you're an AI enthusiast wanting to understand the hardware layer or a developer looking to harness GPU compute power, this tutorial gives you the foundation to start writing efficient parallel programs.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>amd</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>February 2026 DigitalOcean Tutorials: Claude 4.6 and AI Agents</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Thu, 05 Mar 2026 17:00:00 +0000</pubDate>
      <link>https://forem.com/digitalocean/february-2026-digitalocean-tutorials-claude-46-and-ai-agents-14pn</link>
      <guid>https://forem.com/digitalocean/february-2026-digitalocean-tutorials-claude-46-and-ai-agents-14pn</guid>
      <description>&lt;p&gt;Whether you’ve found yourself exploring Anthropic’s latest Claude Opus 4.6 release or following along with the OpenClaw frenzy, &lt;a href="https://www.digitalocean.com/community/tutorials" rel="noopener noreferrer"&gt;DigitalOcean&lt;/a&gt; has tutorials and guides to help you get the most out of the latest AI advancements. &lt;/p&gt;

&lt;p&gt;These 10 tutorials from last month cover AI agent development, RAG troubleshooting, CUDA performance tuning, and OpenClaw on DigitalOcean. Bookmark them for later or keep them open among your 50 browser tabs to come back to.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/claude-opus" rel="noopener noreferrer"&gt;What’s New With Claude Opus 4.6&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.6’s agentic coding model feels less like a coding assistant and more like a collaborative engineer. Developers now have a massive 1M-token context window, which lets the model reason across entire codebases, docs, and long workflows without constantly re-prompting. This means faster refactors, more reliable debugging, and the ability to make iterative UI or architecture changes with just a few guided prompts. Long context plus agentic planning dramatically reduces the time between the idea and working implementation, especially when the model is directly integrated into your cloud stack. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskezjlwkt14l5zi8ddn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskezjlwkt14l5zi8ddn7.png" alt="Claude feature benchmarks" width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/conceptual-articles/self-learning-ai-agents" rel="noopener noreferrer"&gt;Self-Learning AI Agents: A High-Level Overview&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Self-learning agents follow a fundamental loop: observe, act, get feedback, and improve. For developers, these systems aren’t just prompt-driven. They’re built around policies, reward signals, and evolving memory. We make the concept approachable by showing how you can prototype simple versions with standard Python ML tooling. This tutorial can help you determine whether your agent needs to adapt to changing environments or user behavior. You’ll also get a look at how reinforcement-style learning and persistent memory become essential design choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/cuda-performance-tuning-workflow" rel="noopener noreferrer"&gt;CUDA Guide: Workflow for Performance Tuning&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Frustrated by the guesswork involved in GPU optimization? We’ve got a step-by-step guide for you. Learn how to profile first, identify the real bottleneck—memory, compute, or occupancy—and then apply targeted optimizations rather than random tweaks. For developers working with AI or HPC workloads, the biggest win is understanding that most performance gains come from a structured workflow, not exotic kernel tricks. You’ll learn that knowing how to measure, optimize, and re-measure is the only reliable path to predictable CUDA speedups.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/build-ai-agents-the-right-way" rel="noopener noreferrer"&gt;A Simple Guide to Building AI Agents Correctly&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This tutorial is a production blueprint for agentic systems. It covers why naive agent loops fail—runaway costs, hallucinated tool calls, and silent errors—and provides a modular architecture that includes an orchestrator, structured tools, memory, guardrails, and full observability. The most valuable takeaway for real deployments is the “start with the least autonomy” principle: Use deterministic workflows first, and add agent behavior only where it’s truly needed. You want to treat agents like serious software systems with testing, logging, and permissions, not clever prompt chains to get them running correctly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3zrevpc014q94t6c3kn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3zrevpc014q94t6c3kn.png" alt="AI agent workflow " width="800" height="845"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/rag-not-working-solutions" rel="noopener noreferrer"&gt;Why Your RAG Is Not Working Effectively&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If your RAG app feels inaccurate or inconsistent, this tutorial helps you diagnose the real cause; it’s usually retrieval quality, chunking strategy, or missing evaluation rather than the model itself. You’ll walk through concrete fixes like better indexing, query rewriting, and relevance filtering so your system actually returns grounded answers. The key takeaway is that RAG performance is mostly a data-pipeline and retrieval-engineering problem, not an LLM problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/connect-google-to-openclaw" rel="noopener noreferrer"&gt;How to Connect Google to OpenClaw&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If you’re looking for how to connect AI assistants to real-time data, this guide shows how to wire external data sources into your agent workflow so it can act on real user content instead of static prompts. The practical win is learning how authentication, connectors, and permissions shape what your agent can safely do in production. You'll learn how to deploy OpenClaw on a DigitalOcean Droplet and connect it to Google services like Gmail, Calendar, and Drive using OAuth authentication.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/openclaw-next-steps" rel="noopener noreferrer"&gt;So You Installed OpenClaw on a DigitalOcean Droplet. Now What?&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We’ve penned plenty of resources on how to get started with OpenClaw on DigitalOcean (&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-run-openclaw" rel="noopener noreferrer"&gt;how to run it&lt;/a&gt; and how we built a &lt;a href="https://www.digitalocean.com/blog/technical-dive-openclaw-hardened-1-click-app" rel="noopener noreferrer"&gt;security-hardened Droplet&lt;/a&gt;). This follow-up focuses on moving from a working prototype to a more capable, extensible system. You learn how to layer in new tools, expand automation flows, and structure your project so it scales beyond a demo. The key takeaway is architectural: design your agent environment so new capabilities are plug-and-play rather than requiring rewrites.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/effective-context-engineering-ai-agents" rel="noopener noreferrer"&gt;Effective Context Engineering to Build Better AI Agents&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;The prompts you feed your AI agent matter just as much as the model behind it. Instead of cramming everything into a single prompt, this article shows you how to structure memory, retrieval, tool outputs, and task state so the model always sees the right information at the right time. You’ll see how using enough context is your real control surface for agent reliability, latency, and cost. Good context engineering often beats switching to a larger model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5wiwv68w05r4jzn6l5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5wiwv68w05r4jzn6l5h.png" alt="Context engineering workflow" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/sliding-window-attention-efficient-long-context-models" rel="noopener noreferrer"&gt;Sliding Window Attention: Efficient Long-Context Modeling&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Sliding window attention makes long-context transformers far more practical by limiting how many tokens each position can “see.” Instead of every token attending to every other token (which gets expensive fast), the model focuses on a fixed local window—cutting compute costs from quadratic to linear growth. You’ll get a breakdown of how this works, how modern variants improve positional awareness, and why it’s especially useful for long documents, extended chat histories, or agent memory systems. Smarter attention design—not just bigger models—is what makes long-context AI scalable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>learning</category>
    </item>
    <item>
      <title>How to Run Open-Weight Nemotron 3 Models on a GPU Droplet</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Tue, 03 Mar 2026 18:41:16 +0000</pubDate>
      <link>https://forem.com/digitalocean/how-to-run-open-weight-nemotron-3-models-on-a-gpu-droplet-a48</link>
      <guid>https://forem.com/digitalocean/how-to-run-open-weight-nemotron-3-models-on-a-gpu-droplet-a48</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was original written by Andrew Dugan (Senior AI Technical Content Creator II)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.nvidia.com/en-us/" rel="noopener noreferrer"&gt;NVIDIA&lt;/a&gt; has announced the newest additions to their Nemotron family of models, &lt;a href="https://research.nvidia.com/labs/nemotron/Nemotron-3/" rel="noopener noreferrer"&gt;Nemotron 3&lt;/a&gt;. There are three separate models in the Nemotron 3 family that are being released, including Nano, Super, and Ultra, which have 30, 49, and 253 billion parameters respectively with up to 1M tokens in context length. Nano was released in December of 2025, and Super and Ultra are scheduled to be released later in 2026. They are being released on NVIDIA’s &lt;a href="https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/" rel="noopener noreferrer"&gt;open model license&lt;/a&gt;, making them available for commercial use and modification and giving you ownership and complete control over generated outputs. Both the weights and &lt;a href="https://huggingface.co/nvidia/datasets?search=nemotron" rel="noopener noreferrer"&gt;training data&lt;/a&gt; are open and available on &lt;a href="https://huggingface.co/nvidia/collections?search=nemotron" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;. This tutorial will discuss these models and how to deploy the currently available Nano on a &lt;a href="https://www.digitalocean.com/products/gradient/gpu-droplets" rel="noopener noreferrer"&gt;DigitalOcean GPU Droplet&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;NVIDIA has announced Nemotron 3, a new addition to their Nemotron model lineup. Nemotron 3 consists of three new models, Nano (30B), Super (49B), and Ultra (253B).&lt;/li&gt;
&lt;li&gt;As of January, 2026, the smallest model, Nano, is the only one currently available for use. Super and Ultra are scheduled for release later in 2026.&lt;/li&gt;
&lt;li&gt;All of the models are open-weight, allowing for open access for commercial use and modification. The models’ architectures employ novel efficiency improvements to increase model throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Model Overviews
&lt;/h2&gt;

&lt;p&gt;The Nemotron 3 models use a &lt;a href="https://arxiv.org/html/2503.07137v1" rel="noopener noreferrer"&gt;Mixture of Experts&lt;/a&gt; hybrid &lt;a href="https://arxiv.org/abs/2312.00752" rel="noopener noreferrer"&gt;Mamba-Transformer&lt;/a&gt; architecture that is meant to increase the token generation speed, otherwise known as &lt;code&gt;throughput&lt;/code&gt;. This means that the models have fewer layers of self-attention and instead use Mamba-2 (state space model) layers and Mixture-of-Experts (MoE) layers that are computationally less expensive and faster, especially for longer input sequences. This allows the Nemotron 3 models to process longer texts faster while using less memory and resources. Some attention layers are included where needed to keep accuracy as high as possible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi53w68g3c62j7p2stjrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi53w68g3c62j7p2stjrx.png" alt="Nemotron Attention" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NVIDIA describes each of the three models as optimized for different platforms. &lt;a href="https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" rel="noopener noreferrer"&gt;Nano&lt;/a&gt; provides cost efficiency for targeted agentic tasks without sacrificing accuracy. Super offers high accuracy for multi-agentic reasoning. Ultra maximizes reasoning accuracy.&lt;/p&gt;

&lt;p&gt;Nano is the smallest of the three and is comparable to &lt;a href="https://huggingface.co/Qwen/Qwen3-30B-A3B" rel="noopener noreferrer"&gt;Qwen3-30B&lt;/a&gt; and &lt;a href="https://huggingface.co/openai/gpt-oss-20b" rel="noopener noreferrer"&gt;GPT-OSS-20B&lt;/a&gt; in performance. It is the only of the three that is available as of January 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpn3rwv20vay9hed6ic2i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpn3rwv20vay9hed6ic2i.png" alt="Nemotron 3 Nano benchmarks" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nano can be used for both reasoning and non-reasoning tasks with an option to turn off the reasoning capabilities through a flag in the chat template. Responses will be less accurate if reasoning is disabled in the configuration.&lt;/p&gt;

&lt;p&gt;Nano is a hybrid Mixture-of-Experts (MoE) architecture that consists of 23 Mamba-2 and MoE layers and six attention layers, with each MoE layer including 128 experts plus one shared expert. Five experts are activated per token, making 3.5 billion of the 30 billion total parameters active.&lt;/p&gt;

&lt;p&gt;Super and Ultra both use &lt;code&gt;LatentMoE&lt;/code&gt; and &lt;a href="https://arxiv.org/pdf/2404.19737" rel="noopener noreferrer"&gt;Multi-Token Prediction&lt;/a&gt; (MTP) layers that further increase text generation speed. MTP is the ability to predict multiple tokens at once in a single forward pass instead of only predicting a single token. LatentMoE is a novel approach to assigning experts that compresses the input data size each expert needs to process in order to reduce the amount of computation for each token. They use these efficiency savings to increase the number of experts that can be used for each token.&lt;/p&gt;

&lt;p&gt;In NVIDIA’s &lt;a href="https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-White-Paper.pdf" rel="noopener noreferrer"&gt;white paper&lt;/a&gt;on the release, they describe Super as optimized for workloads like IT ticket automation where collaborative agents handle large-volume workloads. Ultra is the option to use when accuracy and reasoning performance are paramount.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 - Creating a GPU Droplet
&lt;/h2&gt;

&lt;p&gt;To deploy the Nemotron 3 Nano on a DigitalOcean GPU Droplet, first, sign in to your DigitalOcean account and create a GPU Droplet.&lt;/p&gt;

&lt;p&gt;Choose AI/ML-Ready as your image and select an NVIDIA H100. Add or select an SSH Key, and &lt;a href="https://cloud.digitalocean.com/registrations/new?activation_redirect=%2Fgpus%2Fnew&amp;amp;redirect_url=%2Fgpus%2Fnew" rel="noopener noreferrer"&gt;create the DigitalOcean Droplet&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 - Connecting to Your GPU Droplet
&lt;/h2&gt;

&lt;p&gt;Once the DigitalOcean Droplet is created, you &lt;a href="https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys" rel="noopener noreferrer"&gt;SSH&lt;/a&gt;(Secure Shell) into your server instance. Go to your command line and enter the following command, replacing the highlighted &lt;code&gt;your_server_ip&lt;/code&gt; placeholder value with the Public IPv4 of your instance. You can find the IP in the &lt;code&gt;Connection Details&lt;/code&gt; section of your GPU Instance Dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh root@your_server_ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may get a message that reads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OutputThe authenticity of host 'your_server_ip (your_server_ip)' can't be established.....Are you sure you want to continue connecting (yes/no/[fingerprint])?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do, you can type &lt;code&gt;yes&lt;/code&gt; and press &lt;code&gt;ENTER&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Installing Python and vLLM
&lt;/h2&gt;

&lt;p&gt;Next, verify you are still in the Linux instance, and install Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install python3 python3-pip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It may notify you that additional space will be used and ask if you want to continue. If it does, type &lt;code&gt;Y&lt;/code&gt; and press &lt;code&gt;ENTER&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you receive a “Daemons using outdated libraries” message asking which services to restart, you can press &lt;code&gt;ENTER&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After Python has finished installing, &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;install vLLM&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This package might take a little while to install. After it is finished installing, download the custom parser from Hugging Face.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The custom parser interprets Nemotron-3 Nano v3’s reasoning and tool-calling markup so vLLM can correctly serve responses and route tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 - Serving the Nemotron Model
&lt;/h2&gt;

&lt;p&gt;Specify exactly which model you want to serve using the model’s ID from Hugging Face.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
 --max-num-seqs 8 \
  --tensor-parallel-size 1 \
  --max-model-len 262144 \
  --port 8000 \
  --trust-remote-code \
  --reasoning-parser-plugin nano_v3_reasoning_parser.py \
  --reasoning-parser nano_v3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;max-num-seqs&lt;/code&gt; is the maximum number of outputs that can be processed concurrently. You can have up to eight single-output requests processed at a single time in this example.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tensor-parallel-size&lt;/code&gt; is the number of GPUs you are spreading the model across via tensor parallelism. One is equal to a single GPU. The &lt;code&gt;max-model-len&lt;/code&gt; is the maximum total tokens per request. &lt;code&gt;trust-remote-code&lt;/code&gt; is necessary for Nemotron’s custom chat template and parsing logic.&lt;/p&gt;

&lt;p&gt;Finally, the &lt;code&gt;reasoning-parser-plugin&lt;/code&gt; and &lt;code&gt;reasoning-parser&lt;/code&gt; parameters load and select the custom reasoning parser.&lt;/p&gt;

&lt;p&gt;Once the model is loaded and served on your instance with vLLM, you can make inference calls to the endpoint using Python locally or from another server. Create a Python file called &lt;code&gt;example_vllm_request.py&lt;/code&gt; and run the following code. Replace &lt;code&gt;your_server_ip&lt;/code&gt; with the IP address of your GPU Droplet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests

url = "http://your_server_ip:8000/v1/completions"
data = {
    "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 1000
}

response = requests.post(url, json=data)
message = response.json()['choices'][0]['message']['content']
print(message)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will see output similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Output
The capital of France is **Paris**.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you print out the entire &lt;code&gt;response.json()&lt;/code&gt; object, you can view the reasoning tokens. If you would like to run it with reasoning disabled, you can add a &lt;code&gt;chat_template_kwargs&lt;/code&gt; parameter to the data object above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data = {
    "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 1000,
    "chat_template_kwargs": {"enable_thinking": False},
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What are the hardware requirements and GPU memory needed to run Nemotron 3 Nano locally?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nemotron 3 Nano can run on GPUs with at least 60 GB of VRAM in BF16 precision, such as an A100 80 GB or &lt;a href="https://www.digitalocean.com/community/tutorials/what-is-an-nvidia-h100" rel="noopener noreferrer"&gt;H100&lt;/a&gt;. A quantized version may allow it to run on GPUs with less memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I fine-tune Nemotron 3 Nano on my own data, and what are the licensing implications?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, Nemotron 3 models are released under NVIDIA’s open model license, which permits commercial use, modification, and fine-tuning. You retain complete ownership of any outputs generated and can fine-tune the model on custom datasets for your specific use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between the Mixture-of-Experts (MoE) architecture and traditional transformer models in terms of inference cost?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The MoE architecture only activates five out of 128 experts per token (3.5B of 30B parameters), making inference much more efficient than traditional dense models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Has NVIDIA released other LLMs in the past?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, NVIDIA has released a large number of open models and datasets, including other Nemotron model versions, Megatron, ASR models, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Nemotron 3 family is a comparatively effective and efficient open model with fast inference and accurate results. The smallest version, Nano, is currently available as of January 2026, and the other two larger versions will become available in coming months.&lt;/p&gt;

&lt;p&gt;In this tutorial, you deployed Nemotron 3 Nano on a DigitalOcean GPU Droplet. Next, you can build a workflow that uses it for any data sensitive applications that require a high degree of privacy and control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/mistral-3-models" rel="noopener noreferrer"&gt;Mistral 3 Models on DigitalOcean&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-build-parallel-agentic-workflows-with-python" rel="noopener noreferrer"&gt;How to Build Parallel Agentic Workflows with Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/run-gpt-oss-vllm-amd-gpu-droplet-rocm" rel="noopener noreferrer"&gt;Run gpt-oss 120B on vLLM with an AMD Instinct MI300X GPU Droplet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nvidia</category>
      <category>ai</category>
      <category>learning</category>
      <category>aimodels</category>
    </item>
    <item>
      <title>Technical Deep Dive: How we Created a Security-hardened 1-Click Deploy OpenClaw</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Tue, 24 Feb 2026 22:11:29 +0000</pubDate>
      <link>https://forem.com/digitalocean/technical-deep-dive-how-we-created-a-security-hardened-1-click-deploy-openclaw-4b99</link>
      <guid>https://forem.com/digitalocean/technical-deep-dive-how-we-created-a-security-hardened-1-click-deploy-openclaw-4b99</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Freddie Rice (Staff Product Security Engineer)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/resources/articles/what-is-moltbot" rel="noopener noreferrer"&gt;OpenClaw, an open source AI assistant&lt;/a&gt; (&lt;a href="https://x.com/openclaw/status/2017103710959075434" rel="noopener noreferrer"&gt;recently renamed from Moltbot&lt;/a&gt;, and earlier Clawdbot), has exploded in popularity over the last few days, and at DigitalOcean we immediately wondered “how can we enable more people to try this new technology safely and easily?” We noticed that there was a lot of interest by folks looking to use this software, but also that there was concern around the security of the open source software, especially when connecting it directly to users’ own machines. We dug in to find a way to deliver this software to our customers as fast as possible, as easily as possible and as safe as possible.&lt;/p&gt;

&lt;p&gt;At DigitalOcean, our &lt;a href="https://marketplace.digitalocean.com/apps/moltbot" rel="noopener noreferrer"&gt;1-Click Deploy OpenClaw&lt;/a&gt; (formerly 1-Click Deploy Moltbot) through our Marketplace enables us to package the latest and greatest software configured for our Droplet® server, and make it easily available to customers. Creating our 1-Click Deploy OpenClaw was the natural next step in getting this to our customers.&lt;/p&gt;

&lt;p&gt;Toying around with OpenClaw on a local machine is fun, but it could severely impact the ability to deploy and use the software for longer term use and may not meet the safe environment that you need. Some of the benefits to deploying on DigitalOcean include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always available – the service is available to customers via the web&lt;/li&gt;
&lt;li&gt;Easy to connect to it – Droplets have a static ip address&lt;/li&gt;
&lt;li&gt;Vertical scalability – scale up CPUs, memory, and disk storage with higher workloads&lt;/li&gt;
&lt;li&gt;Cognitive overload – start with basic configs, tweak the ones that matter to you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We made a lot of changes as we built the 1-Click Deploy OpenClaw, but the main elements we focused on were&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do we communicate with the service safely?&lt;/li&gt;
&lt;li&gt;How do we keep the agentic code isolated from the rest of the system?&lt;/li&gt;
&lt;li&gt;How do we prevent attacks from the wider internet?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that while providing a straightforward deployment UX to our customers! Let’s dig in…&lt;/p&gt;

&lt;h3&gt;
  
  
  Delivering an Image with Safe Defaults
&lt;/h3&gt;

&lt;p&gt;Our priority in creating a 1-Click Deploy OpenClaw on our Droplet was twofold: First, speed, as we wanted to get something out quickly to our users. Second was providing a solution that provided additional security benefits. These are the actions we took to meet those goals:&lt;/p&gt;

&lt;h3&gt;
  
  
  Keeping deployments consistent (DevOps)
&lt;/h3&gt;

&lt;p&gt;We saw that there are multiple ways to deploy the software – we chose the most consistent path, which was picking a stable release from the Git repository on GitHub, pulling it and building from there.&lt;/p&gt;

&lt;p&gt;Why not pull the latest and greatest from main? Changes are happening at a rapid pace, which is awesome for feature development but can come at the expense of stability. Depending on the minute we build our 1-click image, we could get a working version or a broken version.&lt;/p&gt;

&lt;p&gt;So we make sure that we can deliver the latest stable version.&lt;/p&gt;

&lt;h3&gt;
  
  
  TLS (Keep communications safe and auditable)
&lt;/h3&gt;

&lt;p&gt;Once we had a Packer image that we could iterate on, we applied our security best practices for the 1-clicks to set up TLS. This is a crucial step to make sure that our customers can communicate with the bot in a safe way that doesn’t allow eavesdropping.&lt;/p&gt;

&lt;p&gt;Our best practices consist of using Caddy as a reverse proxy with a TLS certificate issued by LetsEncrypt. Caddy ensures that the service is deployed externally is the service we want to publish and provides a safe channel with which to serve it. Furthermore, Caddy outputs logs to a location that can be audited after the fact, allowing the end user to see how their service is actually being used.&lt;/p&gt;

&lt;p&gt;A new UX improvement we added to this image is seamless TLS configuration with LetsEncrypt via IP addresses without requiring a domain name! While OpenClaw spins up, Caddy is requesting a new certificate on your behalf, no configuration required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authz (Gateway Key + Pairing)
&lt;/h3&gt;

&lt;p&gt;How do we know that the requests are coming from you? We have a OpenClaw gateway key in place to make sure that the user who is supposed to use the platform is the correct one.&lt;/p&gt;

&lt;p&gt;Next, we leaned into a feature that OpenClaw provides called “Paring” – this exists to make sure that the devices that are going to communicate with the main server are the trusted ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandboxing (keep safe from Agents)
&lt;/h3&gt;

&lt;p&gt;Part of the configuration is an Anthropic / OpenAI / Model key – these are sensitive pieces of material that are required in order to allow the software to function! So how do we let agents that can run arbitrary code on the machine, not read and abuse these tokens?&lt;/p&gt;

&lt;p&gt;Furthermore, how do we stop the agents from potentially destroying the machine itself?&lt;/p&gt;

&lt;p&gt;Luckily, there is a configuration available that puts the agent deployments into their own containers. If an agent blows up, it will destroy its own ephemeral docker container, but the host filesystem will still be safe from incorrect agentic modifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safe Defaults
&lt;/h3&gt;

&lt;p&gt;These boxes are taking the best configurations that we implement for all of our 1-clicks, including but not limited to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fail2ban&lt;/strong&gt; – Make sure that the background noise of the internet doesn’t cause disruptions to your droplet. It does this by monitoring the logs of failed requests to the system and dynamically updating firewall rules to block known bad patterns on the internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unattended upgrades&lt;/strong&gt; – We want to make sure that your Droplet is always up to date. We have Ubuntu configured with unattended upgrades that periodically will check for vulnerable packages and automatically patch them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Constraints and Upcoming Features
&lt;/h2&gt;

&lt;p&gt;To ensure a stable and repeatable installation, we utilize Packer for our image provisioning; however, during testing, we found that smaller Droplet configurations consistently encountered out-of-memory errors during the snapshot creation process. While this currently necessitates a minimum $24/month Droplet size to match the snapshot’s disk and memory requirements, we chose to prioritize getting this tool into your hands today rather than delaying for further optimization. We are already iterating on the image to reduce its footprint and support lower-cost tiers, and in the spirit of transparency, we have &lt;a href="https://github.com/digitalocean/droplet-1-clicks" rel="noopener noreferrer"&gt;made our Packer scripts public&lt;/a&gt; so you can audit the provisioning process and gain confidence in the one-click experience. We are also working to quickly add support for all DigitalOcean Gradient AI models, including OpenAI, add auto provisioning of Gradient AI API Key and injecting for the user, and more updates as OpenClaw evolves over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  After deploy (make it yours!)
&lt;/h3&gt;

&lt;p&gt;1-Click Deploy OpenClaw is a great launch point, but OpenClaw is infinitely customizable once up and running in the Droplet. Choose which messaging platforms are the best fit for your workflows, and get chatting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get started with the 1-Click Deploy OpenClaw
&lt;/h2&gt;

&lt;p&gt;Get started with the &lt;a href="https://marketplace.digitalocean.com/apps/moltbot" rel="noopener noreferrer"&gt;1-Click Deploy OpenClaw by visiting the Marketplace&lt;/a&gt;, and &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-run-moltbot" rel="noopener noreferrer"&gt;follow this tutorial&lt;/a&gt; for step by step instructions on how to get started.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>digitalocean</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>4 AI Models (That aren’t Opus 4.6) on Our Minds This Week</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Mon, 23 Feb 2026 19:59:44 +0000</pubDate>
      <link>https://forem.com/digitalocean/4-ai-models-that-arent-opus-46-on-our-minds-this-week-h1l</link>
      <guid>https://forem.com/digitalocean/4-ai-models-that-arent-opus-46-on-our-minds-this-week-h1l</guid>
      <description>&lt;p&gt;So many models, so little time. Today, we’re bringing our attention to some super cool releases from Qwen, MiniCPM-o, ACE-Step, and GLM-OCR. So what can these models do?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/Qwen/Qwen3-Coder-Next" rel="noopener noreferrer"&gt;Qwen3-Coder-Next&lt;/a&gt;: An open-weight model built for coding agents and local development that speeds up deployments just as well as more compute-hungry models. By activating just 3B parameters out of 80B total, the model can rival models that require far more compute, making large-scale deployment markedly more economical. The model is also trained for durable agent behavior, including long-horizon reasoning, sophisticated tool use, and recovery from failed executions, and, with a 256k context window plus flexible scaffold support, is designed to integrate smoothly into a wide range of existing CLI and IDE workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/openbmb/MiniCPM-o-4_5" rel="noopener noreferrer"&gt;MiniCPM-o 4.5&lt;/a&gt;: A game-changer for vision performance. The most advanced release in the MiniCPM-o line, packaging a 9B-parameter end-to-end architecture derived from SigLip2, Whisper-medium, CosyVoice2, and Qwen3-8B while adding full-duplex multimodal streaming. The model delivers leading vision performance that rivals or surpasses much larger proprietary systems, supports unified instruction and reasoning modes, and enables natural bilingual real-time speech with expressive voices, cloning, and role play. A major addition is simultaneous video/audio input with concurrent text and speech output, allowing the system to see, listen, talk, and even act proactively in live scenarios. It further strengthens OCR and document understanding, handles high-resolution images and high-FPS video efficiently, supports 30+ languages, and is easy to deploy across local and production environments through broad tooling, quantization options, and ready-to-run inference frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/ACE-Step/Ace-Step1.5" rel="noopener noreferrer"&gt;ACE-Step 1.5&lt;/a&gt;: An open-source and legally compliant music foundation model built to deliver commercial-grade generation on everyday hardware, enabling creators to safely use outputs in professional projects. Trained on a large, legally compliant mix of licensed, royalty-free, and synthetic data, the system can produce complete songs in seconds while running locally on GPUs with under 4GB of VRAM. Its hybrid design uses a language model as an intelligent planner that turns prompts into detailed musical blueprints—covering structure, lyrics, and metadata—which are realized by a diffusion transformer, aligned through intrinsic reinforcement learning rather than external reward models. Beyond raw synthesis, ACE-Step v1.5 supports fine stylistic control, multilingual prompting, and flexible editing workflows such as covers, repainting, and vocal-to-instrumental conversion.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/zai-org/GLM-OCR" rel="noopener noreferrer"&gt;GLM-OCR&lt;/a&gt;: A multimodal system for advanced document understanding built on the GLM-V encoder–decoder framework. To boost learning efficiency, accuracy, and transferability, it incorporates Multi-Token Prediction (MTP) objectives together with a stable, end-to-end reinforcement learning strategy across tasks. The architecture combines a CogViT visual backbone pre-trained on large image-text corpora, a streamlined cross-modal bridge that aggressively downsamples tokens for efficiency, and a GLM 0.5B language decoder for text generation. Paired with a two-stage workflow, layout parsing followed by parallel recognition using PP-DocLayout-V3, the model achieves reliable, high-fidelity OCR results across a wide spectrum of complex document structures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They may not have the marketing dazzle of Anthropic’s flagship model, but these four have an incredible amount of potential to help clear some vexing development issues. What models are you keeping an eye on? Add them in the comments. &lt;/p&gt;

</description>
      <category>aimodels</category>
      <category>qwen</category>
      <category>learning</category>
      <category>huggingface</category>
    </item>
    <item>
      <title>How to Lower Your AI Costs When Scaling Your Business</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Fri, 20 Feb 2026 20:31:09 +0000</pubDate>
      <link>https://forem.com/digitalocean/how-to-lower-your-ai-costs-when-scaling-your-business-4i9k</link>
      <guid>https://forem.com/digitalocean/how-to-lower-your-ai-costs-when-scaling-your-business-4i9k</guid>
      <description>&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/yTfkZ-Eusc8"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;As AI adoption grows, technological maintenance isn’t the only component you need to keep up with; your budget also requires a watchful eye. Especially when inference workloads can scale data—and costs—quickly. Your AI inference bill comes down to three things: the hardware you use, the scale you need, and how fast it generates output.&lt;/p&gt;

&lt;p&gt;If you’re curious how you can lower LLM inference spending, here are three tips to reduce your overall AI costs as you scale:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Diversify your hardware
&lt;/h2&gt;

&lt;p&gt;Hardware is a major reason AI has historically been expensive: the only processing units available to run these workloads are GPUs, and demand exceeded supply (driving up costs). This is true for consumer-grade GPUs, where it's not uncommon to see prices two or three times above MSRP, and data center GPU scarcity is even worse.&lt;/p&gt;

&lt;p&gt;For a long time, NVIDIA held a large market share with its physical hardware and &lt;a href="https://www.digitalocean.com/community/tutorials/intro-to-cuda" rel="noopener noreferrer"&gt;compute unified device architecture (CUDA)&lt;/a&gt;-only frameworks. AMD has since introduced open-source ROCm and made it easier for teams to expand the hardware types they can use for their AI workloads, increasing GPU supply and reducing vendor lock-in. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. Configuration (Model + KV cache) and quantization
&lt;/h2&gt;

&lt;p&gt;When running LLM inference, pay attention to GPU capacity and speed, as they affect overall performance. You need a minimum amount of memory to even load and run a model. Additional capacity beyond that allows you to have a &lt;a href="https://www.youtube.com/shorts/-hv0a_EXWuQ" rel="noopener noreferrer"&gt;bigger KV cache&lt;/a&gt;, which is critical to high-throughput performance; the KV cache stores the history of each conversation for each user that the GPU is currently serving. Without it, your token generation becomes slower, and inference speed slows down. With it, you can serve more users at once and keep token generation steady. &lt;/p&gt;

&lt;p&gt;Beyond using a KV cache and optimizing your model, consider quantization. This practice reduces precision, so less memory (or VRAM) is required to store tokens. A 5000-token conversation, for example, will take several gigabytes of GPU memory (VRAM) to store. These gigabytes contain a massive amount of numbers that the GPU reuses during inference. Each of these numbers requires 2 bytes of memory when using the default 16-bit precision. With 8-bit precision, you only need 1 byte to store the same number of tokens and reduce the overall memory requirements by half. Though your hardware must support 8-bit models for this to work effectively. &lt;/p&gt;

&lt;h2&gt;
  
  
  3. Optimize your parallelism setup
&lt;/h2&gt;

&lt;p&gt;AI production workloads are massive and require gigabytes (or even terabytes) to just load models. Even if you could load a single model onto a GPU that supports 8-bit models, there’s no guarantee that you would be able to successfully have enough memory to run your model and associated activations (calculations the LLM does during inference) on just one GPU. This is where &lt;a href="https://www.digitalocean.com/community/tutorials/splitting-llms-across-multiple-gpus" rel="noopener noreferrer"&gt;tensor parallelism&lt;/a&gt; and &lt;a href="https://www.digitalocean.com/community/conceptual-articles/data-parallelism-distributed-training" rel="noopener noreferrer"&gt;data parallelism&lt;/a&gt; improve performance. &lt;/p&gt;

&lt;p&gt;When you spread your LLM models across multiple GPUs, you reduce the overall calculations (and memory) required per GPU, leaving plenty of room for activations and the KV cache. If you choose to apply this technique, consider the technical overhead of GPU data coordination and synchronization. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you’re curious to see a practical application of these techniques, you can read our full &lt;a href="https://www.digitalocean.com/blog/technical-deep-dive-character-ai-amd" rel="noopener noreferrer"&gt;Character.ai case study&lt;/a&gt; for a technical deep dive. With these workflows in place, the company reduced its inference costs by 50% while continuing to support an app with 10s of millions of users.&lt;/em&gt;   &lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>inference</category>
    </item>
    <item>
      <title>How to Run OpenClaw with DigitalOcean</title>
      <dc:creator>Jess Lulka</dc:creator>
      <pubDate>Tue, 10 Feb 2026 18:45:07 +0000</pubDate>
      <link>https://forem.com/digitalocean/how-to-run-openclaw-with-digitalocean-3mpb</link>
      <guid>https://forem.com/digitalocean/how-to-run-openclaw-with-digitalocean-3mpb</guid>
      <description>&lt;p&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; (formerly known as Moltbot and Clawdbot) is an open-source, self-hosted personal AI assistant that can run directly on your computer. It can execute a variety of tasks, such as managing your calendar, browsing the web, organizing files, managing your email, and running terminal commands. It supports any Large Language Model (LLM), and you can communicate with it through standard chat apps that you already use like WhatsApp, iMessage, Telegram, Discord, or Slack.&lt;/p&gt;

&lt;p&gt;While it is technically possible to run OpenClaw on your local machine, security concerns arise when giving an AI agent open access to your computer with all of your personal data on it. A better approach is to deploy it on a separate machine specifically for OpenClaw or to deploy it on a cloud server.&lt;/p&gt;

&lt;p&gt;There are 3 ways to deploy OpenClaw with DigitalOcean. &lt;a href="https://www.digitalocean.com/community/tutorials/moltbot-quickstart-guide" rel="noopener noreferrer"&gt;You can either deploy yourself on a DigitalOcean Droplet&lt;/a&gt;, deploy with a pre-built 1-Click Application in the Droplet Marketplace, or use the DigitalOcean App Platform. Each of these options will have different security and maintenance considerations, so choose an option based on your app’s needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to choose each deployment option
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bare DigitalOcean Droplet&lt;/strong&gt;: Deploy directly on a DigitalOcean Droplet only if you require full control over server configuration and are comfortable managing security hardening manually.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1-Click Application&lt;/strong&gt;: A 1-Click Application is best for solo developers who want improved security benefits with a fast, self-contained deployment with maximal control, and minimal abstraction. This requires minimal decisions and setup. It is a great option for fast experimentation, but it is not as scalable as the App Platform option.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;App Platform&lt;/strong&gt;: The App Platform is best for teams with a production-level deployment that requires long-term operational maturity. For example, if you need to scale quickly (horizontal auto-scaling), need operational consistency with automatic restarts, zero-downtime deploys, or sleep mode requirements for cost optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both the 1-Click Application and the App Platform deployment options will be covered in this tutorial below. If you prefer to manually deploy OpenClaw on a DigitalOcean Droplet without a 1-Click Application or the App Platform, you can follow the &lt;a href="https://www.digitalocean.com/community/tutorials/moltbot-quickstart-guide" rel="noopener noreferrer"&gt;Quickstart Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The DigitalOcean 1-Click and App Platform deployments handle many of the security best practices for you automatically. These security enhancements include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authenticated communication&lt;/strong&gt;: Droplets generate an OpenClaw gateway token, so communication with your OpenClaw is authenticated, essentially protecting your instance from unauthorized users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardened firewall rules&lt;/strong&gt;: Droplets harden your server with default firewall rules that rate-limit OpenClaw ports to prevent inappropriate traffic from interfering with your OpenClaw use and to help prevent denial-of-service attacks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-root user execution&lt;/strong&gt;: Droplets run OpenClaw as a non-root user on the server, limiting the attack surface if an inappropriate command is executed by OpenClaw.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker container isolation&lt;/strong&gt;: Droplets run OpenClaw inside Docker containers on your server, setting up an &lt;a href="https://docs.openclaw.ai/gateway/sandboxing" rel="noopener noreferrer"&gt;isolated sandbox&lt;/a&gt; and further preventing unintended commands from impacting your server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private DM pairing&lt;/strong&gt;: Droplets configure &lt;a href="https://docs.openclaw.ai/start/pairing" rel="noopener noreferrer"&gt;Direct Message (DM) pairing&lt;/a&gt; by default, which prevents unauthorized individuals from being able to talk to your OpenClaw.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While deploying this way on a cloud server offers security benefits, OpenClaw is still quite new. Like many new tools, it might have architectural characteristics that have not been designed to work with additional security features yet. Therefore, with added security features, some of OpenClaw’s functionality may not function as perfectly as it was intended. For example, some skills might not work out-of-the-box and can require some additional manual set up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw is a powerful, self-hosted AI assistant that can execute tasks like managing calendars, browsing the web, and running terminal commands. It should not be run on your personal machine due to significant security risks associated with giving an AI agent high-level system access.&lt;/li&gt;
&lt;li&gt;Deploying OpenClaw on a DigitalOcean 1-Click Application or on the App Platform provides a safer environment through security features like authenticated communication, hardened firewall rules, non-root user execution, Docker container isolation, and private Direct Message (DM) pairing.&lt;/li&gt;
&lt;li&gt;OpenClaw is model-agnostic and supports various LLMs via Application Programming Interface (API) keys or local deployment, making it flexible for different use cases and preferences.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, you will first deploy an OpenClaw instance onto DigitalOcean’s 1-Click Deploy OpenClaw. Then you will deploy an instance on the App Platform. If you only need to be able to deploy on the App Platform, skip ahead to that section below.&lt;/p&gt;

&lt;h2&gt;
  
  
  1-Click Application
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — Creating an OpenClaw Droplet
&lt;/h3&gt;

&lt;p&gt;First, sign in to your DigitalOcean account and create a &lt;a href="https://cloud.digitalocean.com/droplets/new" rel="noopener noreferrer"&gt;Droplet&lt;/a&gt;. On the Create Droplets page in the DigitalOcean Control Panel, under &lt;code&gt;Region&lt;/code&gt;, select the region closest to you. Under &lt;code&gt;Choose an Image&lt;/code&gt;, select the &lt;code&gt;Marketplace&lt;/code&gt; tab.&lt;/p&gt;

&lt;p&gt;In the search bar, type &lt;code&gt;OpenClaw&lt;/code&gt; and select the OpenClaw image from the search results.&lt;/p&gt;

&lt;p&gt;Next, choose a Droplet plan. The Basic plan with at least 4GB of RAM (such as the &lt;code&gt;s-2vcpu-4gb&lt;/code&gt; size) is recommended for running OpenClaw effectively.&lt;/p&gt;

&lt;p&gt;Under &lt;code&gt;Authentication&lt;/code&gt;, select &lt;code&gt;SSH Key&lt;/code&gt; and add your SSH key if you haven’t already. If you need to create an SSH key, follow the instructions in &lt;a href="https://docs.digitalocean.com/products/droplets/how-to/add-ssh-keys/" rel="noopener noreferrer"&gt;How to Add SSH Keys to New or Existing Droplets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, give your Droplet a hostname (such as &lt;code&gt;OpenClaw-server&lt;/code&gt;), and click &lt;code&gt;Create Droplet&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Alternatively, you can create an OpenClaw Droplet using the DigitalOcean API. To create a 4GB OpenClaw Droplet in the NYC3 region, use the following curl command. You’ll need to either save your &lt;a href="https://docs.digitalocean.com/reference/api/create-personal-access-token/" rel="noopener noreferrer"&gt;API access token&lt;/a&gt; to an environment variable or substitute it into the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer '$TOKEN'' -d \
    '{"name":"choose_a_name","region":"nyc3","size":"s-2vcpu-4gb","image":"openclaw"}' \
    "https://api.digitalocean.com/v2/droplets"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once your Droplet is created, it takes a few minutes to fully initialize. After initialization, you can SSH into your Droplet using the IPv4 address shown in your DigitalOcean dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh root@your_droplet_ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;your_droplet_ip&lt;/code&gt; with your Droplet’s actual IP address.&lt;/p&gt;

&lt;p&gt;Once logged in, the OpenClaw installation will be ready to configure. The DigitalOcean 1-Click Deploy OpenClaw includes OpenClaw version 2026.1.24-1 pre-installed with all necessary dependencies.&lt;/p&gt;

&lt;p&gt;You will see a welcome message from OpenClaw. Under the Control UI &amp;amp; Gateway Access section, you will see a Dashboard URL. Note the Dashboard URL value. You will use it later to access the GUI in your browser.&lt;/p&gt;

&lt;p&gt;You will see a welcome message from OpenClaw. Under the &lt;code&gt;Control UI &amp;amp; Gateway Access&lt;/code&gt; section, you will see a &lt;code&gt;Dashboard URL&lt;/code&gt;. Note the Dashboard URL value. You will use it later to access the GUI in your browser.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8ntl1pi8pdprid3766h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8ntl1pi8pdprid3766h.png" alt="Droplet Dashboard URL" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Within the terminal, choose &lt;code&gt;Anthropic&lt;/code&gt; as your AI Provider. If you have access to Gradient AI, you can select that option. OpenAI models will be available soon. Once you select your provider, provide the respective API/Secret key.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Using OpenClaw
&lt;/h3&gt;

&lt;p&gt;With the 1-Click Application, there are 2 ways to use OpenClaw: you can either use the Graphical User Interface (GUI) through your browser, or you can use the Text User Interface (TUI) through your terminal.&lt;/p&gt;

&lt;p&gt;After entering your API key, it may ask if you want to run pairing automation now. This pairing is to give you access to the UI dashboard (GUI). If you would like to use the GUI, you can type &lt;code&gt;yes&lt;/code&gt; and enter.&lt;/p&gt;

&lt;p&gt;It will then provide you with a url. Open a browser, and in the URL bar, paste the provided URL from OpenClaw. This will open an OpenClaw GUI directly in your browser using the Gateway token to authenticate you for additional security. You will then need to go back to the terminal, type &lt;code&gt;continue&lt;/code&gt;, and press enter to continue the automated pairing.&lt;/p&gt;

&lt;p&gt;In your browser, click refresh, and you will be directed to the default chat page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tdgpkrl1t6la8swgcc1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tdgpkrl1t6la8swgcc1.png" alt="OpenClaw Chat Page" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here you can type a message and send, and OpenClaw will respond. For example, if you ask about what files it can see, it will tell you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input
What files can you currently see on my computer?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Output
Here’s a list of the files and directories currently visible in the sandbox workspace:
.
├── AGENTS.md
├── BOOTSTRAP.md
├── HEARTBEAT.md
├── USER.md
└── skills
    ├── 1password
    │   ├── SKILL.md
    │   └── references
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the GUI, you can review the bot’s usage, add communication channels, schedule cron jobs, add skills, and manage all aspects of OpenClaw.&lt;/p&gt;

&lt;p&gt;To use the TUI, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/opt/openclaw-tui.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Depending on the version, the script may also be located at &lt;code&gt;/opt/clawdbot-tui.sh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You’ve now successfully deployed OpenClaw on DigitalOcean and accessed it through a web browser. From here, you can explore additional OpenClaw capabilities, such as browsing the web, managing files, or executing terminal commands on your Droplet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 - Installing Skills with the 1-Click Application
&lt;/h3&gt;

&lt;p&gt;OpenClaw comes with over 50 skills automatically loaded in the skill registry. You can install skills in the GUI by navigating to the &lt;code&gt;Skills&lt;/code&gt; section in the browser dashboard. For example, to integrate with Google Calendar, search for calendar, and click on &lt;code&gt;Install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpx52jlcjrjopoygilu7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpx52jlcjrjopoygilu7.png" alt="OpenClaw Skills" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A large number of skills are available to perform a wide range of tasks including managing your files, automating web browsing, monitoring health and smart home technologies, and managing social media communication. Read through &lt;a href="https://www.digitalocean.com/resources/articles/what-is-moltbot" rel="noopener noreferrer"&gt;What is OpenClaw?&lt;/a&gt; for an overview of how OpenClaw works and what OpenClaw’s capabilities are.&lt;/p&gt;

&lt;h2&gt;
  
  
  App Platform
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 - Creating the OpenClaw App
&lt;/h3&gt;

&lt;p&gt;Deploying through DigitalOcean’s App Platform follows a slightly different process. First, go to the &lt;a href="https://github.com/digitalocean-labs/openclaw-appplatform" rel="noopener noreferrer"&gt;OpenClaw App Platform repo&lt;/a&gt; and click on the &lt;code&gt;Deploy to Digital Ocean&lt;/code&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b7qaftpj0wd1q88jlch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b7qaftpj0wd1q88jlch.png" alt="Deploy OpenClaw to DigitalOcean" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sign in or Create an Account. Scroll down in the &lt;code&gt;Environment Variables&lt;/code&gt; section and click &lt;code&gt;Edit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pu43iutgmfv7ipx583c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pu43iutgmfv7ipx583c.png" alt="Editing Environmental Variables" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add your &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#keys" rel="noopener noreferrer"&gt;model access key&lt;/a&gt; to the &lt;code&gt;GRADIENT_API_KEY&lt;/code&gt; parameter. This will allow you to use your Gradient AI Serverless Inference account for the OpenClaw bot. Finally, click on &lt;code&gt;Create APP&lt;/code&gt;. It can take up to 5 minutes to finish building the app.&lt;/p&gt;

&lt;p&gt;After it has finished building, go to the &lt;code&gt;Console&lt;/code&gt; tab of the app, and confirm it is working by typing in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openclaw gateway health --url ws://127.0.0.1:18789
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have a working OpenClaw App. If you would like to connect to it remotely, you will need to &lt;a href="https://docs.digitalocean.com/reference/doctl/how-to/install/" rel="noopener noreferrer"&gt;install and configure doctl&lt;/a&gt;, the official command line interface (CLI) for the DigitalOcean API. You will need to follow the instructions to create an API token, use the API token to grant account access to doctl, then use the &lt;a href="https://docs.digitalocean.com/reference/doctl/reference/apps/console/" rel="noopener noreferrer"&gt;doctl apps console&lt;/a&gt; command to initiate a console session for the app. This step is not completely necessary because you can just use the &lt;code&gt;Console&lt;/code&gt; in your application through the DigitalOcean website.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 - Connecting OpenClaw to WhatsApp
&lt;/h3&gt;

&lt;p&gt;When you initiate the console session, you will be accessing it as the root user. So first, you need to change to the openclaw user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;su openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you need to change directories into the &lt;code&gt;home&lt;/code&gt; directory of the openclaw user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The &lt;code&gt;cd&lt;/code&gt; command without arguments changes to the current user’s home directory. Since you’re now the &lt;code&gt;openclaw&lt;/code&gt; user, this will navigate to &lt;code&gt;/home/openclaw&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To connect your OpenClaw application to WhatsApp, go to the &lt;code&gt;Console&lt;/code&gt; of your OpenClaw application enter the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openclaw channels login --channel whatsapp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow the instructions to scan the QR code and connect with your bot through WhatsApp.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 - Installing Skills with the App Platform Application
&lt;/h3&gt;

&lt;p&gt;To install skills through the App Platform application, from the &lt;code&gt;openclaw&lt;/code&gt; user in the console, browse skills with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openclaw skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find a skill that you would like to use and install it with the following command, replacing &lt;code&gt;skill_name&lt;/code&gt; with the name of the skill you would like to install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx clawhub install &amp;lt;skill_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have an OpenClaw App Platform application with a WhatsApp connection and skills. You can execute the &lt;code&gt;openclaw&lt;/code&gt; command to access the rest of openclaw’s features.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I use a model other than Claude?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, OpenClaw is designed to be model-agnostic, so it does support models other than Anthropic’s Claude. It allows users to use various Large Language Models (LLMs) via API keys or locally. However, note that using the DigitalOcean 1-Click Deploy OpenClaw as outlined above, most users will only be able to use Anthropic (support for OpenAI coming soon!).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I deploy it on other operating systems that are not Linux?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, you can deploy OpenClaw on &lt;a href="https://docs.openclaw.ai/platforms/windows" rel="noopener noreferrer"&gt;Windows&lt;/a&gt;, &lt;a href="https://docs.openclaw.ai/platforms/macos" rel="noopener noreferrer"&gt;macOS&lt;/a&gt;, and &lt;a href="https://docs.openclaw.ai/platforms/linux" rel="noopener noreferrer"&gt;Linux&lt;/a&gt;, and &lt;a href="https://docs.openclaw.ai/platforms" rel="noopener noreferrer"&gt;other platforms&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the main security concerns?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The main security concerns are its high-level system access, potential for misconfiguration, and its ability to execute arbitrary code that might be harmful to your system. It’s important to be aware of the environment in which it’s deployed and the access it has.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I give API Key access to my Agents?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is possible to selectively give agents more control over the world around you. The default OpenClaw application will keep these keys together in an environment that is available to all agents. This configuration gives you control to inject the keys you want to the agents that should have those powers. On the “Agents” Menu bar, select the Agent you’d like to grant access (or “Defaults” for all), then under Sandbox &amp;gt; Docker &amp;gt; Env, add the selected API Keys that should be used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does pricing work with OpenClaw?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw is free and open-source to download and use, but you are paying for the LLM tokens. Therefore, the price depends on your usage. You should be careful with this because with scheduled jobs or other functionality, the costs can increase quickly and unexpectedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you deployed OpenClaw on DigitalOcean, creating a secure environment for your personal AI assistant. By running OpenClaw on a cloud server instead of your local machine, you’ve significantly reduced security risks while maintaining the full functionality of this powerful tool.&lt;/p&gt;

&lt;p&gt;The DigitalOcean OpenClaw deployment provides critical security features out of the box—including authenticated communication, hardened firewall rules, Docker container isolation, and non-root user execution—that make it safer to experiment with AI agent capabilities. You accessed it through a web browser and can now execute various tasks through your preferred messaging apps. Next, try adding new skills to your OpenClaw instance and customize the app to best suit your agentic needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/moltbot-quickstart-guide" rel="noopener noreferrer"&gt;OpenClaw Quickstart Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-build-parallel-agentic-workflows-with-python" rel="noopener noreferrer"&gt;How to Build Parallel Agentic Workflows with Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/mistral-3-models" rel="noopener noreferrer"&gt;Mistral 3 Models on DigitalOcean&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>learning</category>
      <category>digitalocean</category>
    </item>
  </channel>
</rss>
