<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: kimi ene</title>
    <description>The latest articles on Forem by kimi ene (@kimi_ene).</description>
    <link>https://forem.com/kimi_ene</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2966524%2F2e785c63-2434-4510-88b7-ba26055dc6c7.png</url>
      <title>Forem: kimi ene</title>
      <link>https://forem.com/kimi_ene</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kimi_ene"/>
    <language>en</language>
    <item>
      <title>How to Deploy a LLM Locally and Make It Accessible from the Internet</title>
      <dc:creator>kimi ene</dc:creator>
      <pubDate>Sat, 29 Mar 2025 18:47:02 +0000</pubDate>
      <link>https://forem.com/kimi_ene/how-to-deploy-a-llm-locally-and-make-it-accessible-from-the-internet-5a9e</link>
      <guid>https://forem.com/kimi_ene/how-to-deploy-a-llm-locally-and-make-it-accessible-from-the-internet-5a9e</guid>
      <description>&lt;p&gt;This post shares my personal experience on how to deploy a LLM locally and make it accessible from the public internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Requirements&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A personal computer&lt;/li&gt;
&lt;li&gt;A server with a public IP address&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The process is divided into three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use Ollama to deploy the &lt;code&gt;Deepseek-R1&lt;/code&gt; model locally.&lt;/li&gt;
&lt;li&gt;Deploy Open-WebUI.&lt;/li&gt;
&lt;li&gt;Use Neutrino-Proxy to enable NAT traversal.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Deploying a LLM Locally with Ollama&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, let me explain why I chose to deploy the model on my personal computer instead of the server. Simply put, I can't afford a high-performance server. My server only has 2 CPU cores and 2GB of RAM, which is far from sufficient for deploying LLMs.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Download Ollama&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;Download Ollama on Windows&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Go to the Ollama website, download the installer, and install it on your computer.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Running Ollama&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Ollama directory looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuydf40qgl6no1ixje58p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuydf40qgl6no1ixje58p.png" width="311" height="264"&gt;&lt;/a&gt;    &lt;/p&gt;

&lt;p&gt;We won't use &lt;code&gt;ollama app.exe&lt;/code&gt;. Instead, we'll use &lt;code&gt;ollama.exe&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Open Git Bash or CMD, and running any Ollama command will start the service:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;serve&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create a model from a Modelfile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;show&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Show information for a model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run a model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stop&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stop a running model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pull&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pull a model from a registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;push&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Push a model to a registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List running models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Copy a model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Remove a model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;help&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Help about any command&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Running &lt;code&gt;deepseek-r1:8b&lt;/code&gt;&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-r1:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, you can skip the second step because running &lt;code&gt;ollama run deepseek-r1:8b&lt;/code&gt; will automatically start Ollama.&lt;/p&gt;

&lt;p&gt;Alternatively, you can pull the model first and then run it.&lt;/p&gt;

&lt;p&gt;After running the command, you'll see a command-line interface where you can interact with the model. For example, you can say hello:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kergteg28ohw4u4ruej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kergteg28ohw4u4ruej.png" width="477" height="176"&gt;&lt;/a&gt;    &lt;/p&gt;

&lt;p&gt;At this point, the model is successfully deployed.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Accessing the API&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In practice, no one interacts with a LLM via the command line. Instead, you can use the API provided by Ollama. Check out the API documentation here: &lt;a href="https://github.com/ollama/ollama/blob/main/docs/api.md" rel="noopener noreferrer"&gt;ollama/docs/api.md at main · ollama/ollama (github.com)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two main APIs for interacting with the model:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;POST /api/generate&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:11434/api/generate &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "deepseek-r1:8b",
  "prompt": "What color is the sky at different times of the day? Respond using JSON",
  "format": "json",
  "stream": false
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;POST /api/chat&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:11434/api/chat &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "deepseek-r1:8b",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ]
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference, as I understand it, is that &lt;code&gt;/generate&lt;/code&gt; allows you to send a simple &lt;code&gt;prompt&lt;/code&gt;, while &lt;code&gt;/chat&lt;/code&gt; requires constructing a &lt;code&gt;messages&lt;/code&gt; array, enabling the model to "keep a chat memory." &lt;code&gt;/chat&lt;/code&gt; is more comprehensive, so I generally use it (though I could be wrong, but it works for me).&lt;/p&gt;

&lt;p&gt;Most parameters in the API have default values, so you can use them as needed. Refer to the documentation for details: &lt;a href="https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values" rel="noopener noreferrer"&gt;ollama/docs/modelfile.md at main · ollama/ollama (github.com)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you exit Ollama, you can restart it later using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't need to specify which model to run, as the API will automatically start the specified model when called.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Deploying Open-WebUI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Visit the Open-WebUI documentation here: &lt;a href="https://docs.openwebui.com/#manual-installation" rel="noopener noreferrer"&gt;🏡 Home | Open WebUI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are multiple ways to deploy Open-WebUI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using Docker&lt;/li&gt;
&lt;li&gt;Manual installation: &lt;a href="https://docs.openwebui.com/#manual-installation" rel="noopener noreferrer"&gt;🏡 Home | Open WebUI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I chose to use Docker on a Linux server because I don't have Docker installed on my Windows machine, and I didn't want to use its UV installer.&lt;/p&gt;

&lt;p&gt;Run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt;  &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-p&lt;/span&gt; 3101:8080 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--add-host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.docker.internal:host-gateway &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://host:port &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;ENABLE_OPENAI_API&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /root/open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
ghcr.io/open-webui/open-webui:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Explanation of the Command&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;OLLAMA_BASE_URL&lt;/code&gt;&lt;/strong&gt;: This is the address of the Ollama service we started earlier. Since my Ollama is running on my Windows machine and Open-WebUI is deployed on the server, I need to use NAT traversal (explained later).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ENABLE_OPENAI_API&lt;/code&gt;&lt;/strong&gt;: I set this to &lt;code&gt;false&lt;/code&gt; because I don't want Open-WebUI to fetch OpenAI models. I only want to use the models I deployed with Ollama. You can enable or disable this based on your needs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I used only a few environment variables here, but there are many more available in the documentation. You can configure them as needed: &lt;a href="https://docs.openwebui.com/getting-started/env-configuration#openai" rel="noopener noreferrer"&gt;🌍 Environment Variable Configuration | Open WebUI&lt;/a&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Using Open-WebUI&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After running the Docker container, open your browser and go to &lt;code&gt;http://ip:3101&lt;/code&gt;. You'll see the login page:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvd0fty0ly4en6dnanjq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvd0fty0ly4en6dnanjq.png" width="800" height="374"&gt;&lt;/a&gt;    &lt;/p&gt;

&lt;p&gt;The first user to register and log in will become the administrator of this Open-WebUI instance. The account and password will be stored in its local database, so you don't need to configure an external database.&lt;/p&gt;

&lt;p&gt;After logging in, you'll see the homepage:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vn367m7cjxcemq0s0u8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vn367m7cjxcemq0s0u8.png" width="800" height="374"&gt;&lt;/a&gt;    &lt;/p&gt;

&lt;p&gt;At this point, if Open-WebUI and Ollama are on the same local network and configured correctly, you should see the &lt;code&gt;Select a model&lt;/code&gt; dropdown with the &lt;code&gt;deepseek-8b&lt;/code&gt; model we just ran.&lt;/p&gt;

&lt;p&gt;However, since my setup is different, I need to use NAT traversal.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;NAT Traversal&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If both Open-WebUI and Ollama are deployed on the same local network, you only need to expose Open-WebUI to the public internet. However, if they are on separate networks (as in my case), you'll need to expose Ollama as well.&lt;/p&gt;

&lt;p&gt;I used Neutrino-Proxy for this purpose. You can find the documentation here: &lt;a href="https://neutrino-proxy.dromara.org/neutrino-proxy/" rel="noopener noreferrer"&gt;neutrino-proxy. (dromara.org)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Other NAT traversal tools can also work, but NAT traversal is not the focus of this post, so I won't go into detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Important Notes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;By default, Ollama binds to &lt;code&gt;127.0.0.1&lt;/code&gt; instead of &lt;code&gt;0.0.0.0&lt;/code&gt;. If you want to expose Ollama to the public internet, you can use Nginx as a reverse proxy or change the binding IP to &lt;code&gt;0.0.0.0&lt;/code&gt;. Refer to the documentation here: &lt;a href="https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-windows" rel="noopener noreferrer"&gt;ollama/docs/faq.md at main · ollama/ollama (github.com)&lt;/a&gt;. Otherwise, Ollama will throw a &lt;code&gt;403&lt;/code&gt; error.&lt;/p&gt;

&lt;p&gt;Exposing Ollama to the public internet carries some risks. Unless it's for learning purposes or personal use, it's generally not recommended.&lt;/p&gt;

&lt;p&gt;Once NAT traversal is configured, set the public address of Ollama in &lt;code&gt;OLLAMA_BASE_URL&lt;/code&gt;, and everything should work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3r2q8c1odp5s87c6jqc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3r2q8c1odp5s87c6jqc.png" width="800" height="374"&gt;&lt;/a&gt;    &lt;/p&gt;

&lt;p&gt;Now you can start chatting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqidwyxupa6q4v2rliu7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqidwyxupa6q4v2rliu7.png" width="800" height="374"&gt;&lt;/a&gt;    &lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Summary&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is my personal method for deploying a LLM locally. It's not a tutorial or the most optimal solution—just a personal sharing. Feel free to adapt it to your own preferences!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>learning</category>
      <category>llm</category>
      <category>chatgpt</category>
    </item>
  </channel>
</rss>
