<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sumit Roy</title>
    <description>The latest articles on Forem by Sumit Roy (@sumit_roy9007).</description>
    <link>https://forem.com/sumit_roy9007</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3448808%2Ffb349822-ca33-40a8-9dc6-b959c08d5210.jpg</url>
      <title>Forem: Sumit Roy</title>
      <link>https://forem.com/sumit_roy9007</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sumit_roy9007"/>
    <language>en</language>
    <item>
      <title>Zero-Downtime VM to Kubernetes Migration with Istio: A Complete Production Guide</title>
      <dc:creator>Sumit Roy</dc:creator>
      <pubDate>Mon, 22 Sep 2025 02:37:50 +0000</pubDate>
      <link>https://forem.com/sumit_roy9007/zero-downtime-vm-to-kubernetes-migration-with-istio-a-complete-production-guide-j1d</link>
      <guid>https://forem.com/sumit_roy9007/zero-downtime-vm-to-kubernetes-migration-with-istio-a-complete-production-guide-j1d</guid>
      <description>&lt;p&gt;I was troubleshooting a failed migration for one of my previous projects, watching our legacy service crash as we tried moving it from VMs to Kubernetes. The traditional 'maintenance window and hope' approach wasn't working.&lt;/p&gt;

&lt;p&gt;That's when I discovered something magical: &lt;strong&gt;&lt;em&gt;hybrid deployments with Istio service mesh&lt;/em&gt;&lt;/strong&gt;. The ability to run applications on both VMs and Kubernetes simultaneously, gradually shifting traffic with zero downtime.&lt;/p&gt;

&lt;p&gt;So, I asked myself:&lt;br&gt;
&lt;em&gt;"&lt;em&gt;Can I migrate legacy applications from VMs to Kubernetes without any service interruption, while having full control over traffic routing and the ability to instantly rollback&lt;/em&gt;?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer: &lt;strong&gt;&lt;em&gt;Absolutely. Using k8s + Istio + WorkloadEntry + Canary Deployments&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Here i have stimulated the exact approach on my local&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  🎯 &lt;strong&gt;What You'll Learn&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Set up a hybrid VM + Kubernetes deployment using Istio service mesh&lt;/li&gt;
&lt;li&gt;Register VM applications in Kubernetes service discovery with WorkloadEntry&lt;/li&gt;
&lt;li&gt;Implement canary deployments with intelligent traffic splitting&lt;/li&gt;
&lt;li&gt;Master production-grade migration strategies with instant rollback capabilities&lt;/li&gt;
&lt;li&gt;Handle real-world migration challenges and troubleshooting&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  🛠️ &lt;strong&gt;Tech Stack&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;k3d&lt;/strong&gt; - Lightweight Kubernetes cluster for local development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Istio&lt;/strong&gt; - Service mesh for traffic management and observability
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WorkloadEntry&lt;/strong&gt; - Register VM workloads in Kubernetes service registry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ServiceEntry&lt;/strong&gt; - Define external services in the mesh&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VirtualService&lt;/strong&gt; - Advanced traffic routing and canary deployments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js&lt;/strong&gt; - Sample application (easily replaceable with any tech stack)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  📚 &lt;strong&gt;The Migration Challenge&lt;/strong&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Why Traditional Migration Fails&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most organizations attempt migrations like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance Window&lt;/strong&gt; → Schedule downtime (expensive!)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pray and Deploy&lt;/strong&gt; → Deploy new version, hope nothing breaks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All or Nothing&lt;/strong&gt; → 100% traffic shift immediately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panic Mode&lt;/strong&gt; → When things go wrong, scramble to rollback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Sleepless nights, angry customers, and failed projects.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The Istio Solution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of switching instantly, we create a &lt;strong&gt;hybrid architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VM Application&lt;/strong&gt; serves 80% of traffic initially&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Application&lt;/strong&gt; serves 20% of traffic (canary)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradual Migration&lt;/strong&gt; → Shift from 80/20 → 50/50 → 20/80 → 0/100&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instant Rollback&lt;/strong&gt; → One command reverts all traffic to VM&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  🧑‍💻 &lt;strong&gt;Building Our Migration Lab&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's simulate a real-world scenario where we migrate a Node.js API from a VM to Kubernetes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: &lt;strong&gt;Create the "Legacy" VM Application&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;First, let's build our legacy application that's currently running on a VM:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mkdir migration-demo&lt;/code&gt;&lt;br&gt;
&lt;code&gt;cd migration-demo&lt;/code&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Create a simple Node.js API&lt;/strong&gt;
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; app.js &amp;lt;&amp;lt; 'EOF'
const express = require('express');
const os = require('os');
const app = express();
const port = 3000;

app.get('/', (req, res) =&amp;gt; {
    res.json({
        message: 'Hello from Migration Demo!',
        hostname: os.hostname(),
        platform: 'VM',
        timestamp: new Date().toISOString(),
        version: 'v1.0'
    });
});

app.get('/health', (req, res) =&amp;gt; {
    res.json({ status: 'healthy' });
});

app.listen(port, '0.0.0.0', () =&amp;gt; {
    console.log(`App running on port ${port}`);
});
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Create package.json
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; package.json &amp;lt;&amp;lt; 'EOF'
{
  "name": "migration-demo",
  "version": "1.0.0",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "dependencies": {
    "express": "^4.18.2"
  }
}
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Install and run our "VM" application:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Node.js and dependencies&lt;/span&gt;
&lt;span class="sb"&gt;`&lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;

&lt;span class="c"&gt;# Start the VM application&lt;/span&gt;
&lt;span class="sb"&gt;`&lt;/span&gt;npm start &amp;amp;&lt;span class="sb"&gt;`&lt;/span&gt;

&lt;span class="c"&gt;# Test it's working&lt;/span&gt;
&lt;span class="sb"&gt;`&lt;/span&gt;curl http://localhost:3000&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello from Migration Demo!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hostname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-machine"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-01-27T..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v1.0"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is our legacy application running on the "VM".&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: &lt;strong&gt;Containerize for Kubernetes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now let's prepare the same application for Kubernetes:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Create Dockerfile&lt;/strong&gt;
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; Dockerfile &amp;lt;&amp;lt; 'EOF'
FROM node:18-alpine

WORKDIR /app
COPY package*.json ./
RUN npm install --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
EOF

# Build Docker image
docker build -t migration-demo:v1.0 .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: &lt;strong&gt;Set Up Kubernetes Cluster&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;bash&lt;/p&gt;

&lt;h1&gt;
  
  
  Create k3d cluster with port mappings
&lt;/h1&gt;

&lt;p&gt;&lt;code&gt;k3d cluster create migration-cluster \&lt;br&gt;
  --port "8080:80@loadbalancer" \&lt;br&gt;
  --port "8443:443@loadbalancer" \&lt;br&gt;
  --agents 2&lt;/code&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Load our image into the cluster
&lt;/h1&gt;

&lt;p&gt;&lt;code&gt;k3d image import migration-demo:v1.0 -c migration-cluster&lt;/code&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Verify cluster
&lt;/h1&gt;

&lt;p&gt;&lt;code&gt;kubectl get nodes&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
### Step 4: **Install Istio Service Mesh**

  bash
# Install Istio
`istioctl install --set values.defaultRevision=default -y`

# Enable automatic sidecar injection
`kubectl label namespace default istio-injection=enabled`

# Verify Istio is running
`kubectl get pods -n istio-system`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait until all Istio pods show &lt;code&gt;Running&lt;/code&gt; status.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 The Magic: &lt;strong&gt;Hybrid Deployment Setup&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is where it gets exciting! We're going to register our VM application with Istio so both VM and Kubernetes versions can coexist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: &lt;strong&gt;Deploy Kubernetes Version&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; k8s-deployment.yaml &amp;lt;&amp;lt; 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: migration-demo-k8s
  labels:
    app: migration-demo
    version: k8s
spec:
  replicas: 2
  selector:
    matchLabels:
      app: migration-demo
      version: k8s
  template:
    metadata:
      labels:
        app: migration-demo
        version: k8s
    spec:
      containers:
      - name: migration-demo
        image: migration-demo:v1.0
        ports:
        - containerPort: 3000
        env:
        - name: PLATFORM
          value: "Kubernetes"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: migration-demo-service
spec:
  ports:
  - port: 3000
    name: http
  selector:
    app: migration-demo
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;kubectl apply -f k8s-deployment.yaml&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Wait for pods to be ready:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl get pods&lt;/code&gt;&lt;br&gt;
 Should show: &lt;br&gt;
&lt;code&gt;migration-demo-k8s-xxx 2/2 Running (2/2 = app + istio-proxy)&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: &lt;code&gt;Register VM in Service Mesh&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Here's the breakthrough - we register our VM application with Istio using &lt;strong&gt;WorkloadEntry&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; vm-workloadentry.yaml &amp;lt;&amp;lt; 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  name: migration-demo-vm
  namespace: default
spec:
  address: "host.k3d.internal"  # k3d's way to reach host machine
  ports:
    http: 3000
  labels:
    app: migration-demo
    version: vm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: migration-demo-vm-service
  namespace: default
spec:
  hosts:
  - migration-demo-vm.local
  ports:
  - number: 3000
    name: http
    protocol: HTTP
  location: MESH_EXTERNAL
  resolution: STATIC
  endpoints:
  - address: "host.k3d.internal"
    ports:
      http: 3000
    labels:
      app: migration-demo
      version: vm
EOF

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f vm-workloadentry.yaml
kubectl apply -f vm-serviceEntry.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What just happened?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our VM application is now part of the Kubernetes service discovery!&lt;/li&gt;
&lt;li&gt;Istio can route traffic to both VM and Kubernetes versions&lt;/li&gt;
&lt;li&gt;Both applications share the same service name: &lt;code&gt;migration-demo-service&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎛️ &lt;strong&gt;Canary Deployment: The Migration Control Panel&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now for the most powerful part - intelligent traffic routing:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: &lt;strong&gt;Configure Traffic Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;bash&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Create traffic routing rules&lt;/strong&gt;
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; destination-rule.yaml &amp;lt;&amp;lt; 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: migration-demo-destination
spec:
  host: migration-demo-service
  subsets:
  - name: vm
    labels:
      version: vm
  - name: k8s
    labels:
      version: k8s
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; virtual-service.yaml &amp;lt;&amp;lt; 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: migration-demo-vs
spec:
  hosts:
  - migration-demo-service
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: migration-demo-service
        subset: k8s
  - route:
    - destination:
        host: migration-demo-service
        subset: vm
      weight: 80
    - destination:
        host: migration-demo-service
        subset: k8s
      weight: 20
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;kubectl apply -f destination-rule.yaml&lt;br&gt;
kubectl apply -f virtual-service.yaml&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we just created:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;80% traffic&lt;/strong&gt; goes to VM (safe, proven version)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20% traffic&lt;/strong&gt; goes to Kubernetes (canary testing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature flag&lt;/strong&gt;: &lt;code&gt;canary: true&lt;/code&gt; header routes 100% to Kubernetes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instant control&lt;/strong&gt;: Change weights anytime without deployment&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 8: &lt;strong&gt;Test the Migration in Action&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Create a test client:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt; test-pod.yaml &amp;lt;&amp;lt; 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: test-client
spec:
  containers:
  - name: curl
    image: curlimages/curl:latest
    command: ["/bin/sh"]
    args: ["-c", "while true; do sleep 3600; done"]
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;kubectl apply -f test-pod.yaml&lt;br&gt;
kubectl wait --for=condition=ready pod/test-client&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test normal traffic distribution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo "Testing normal traffic (80% VM, 20% K8s):"
for i in {1..10}; do
  kubectl exec -it test-client -- curl -s http://migration-demo-service:3000 | grep '"platform"'
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test canary routing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo "Testing canary header (100% K8s):"
for i in {1..5}; do
  kubectl exec -it test-client -- curl -s -H "canary: true" http://migration-demo-service:3000 | grep '"platform"'
done

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🎉 You should see:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normal requests: Mix of &lt;code&gt;"platform": "VM"&lt;/code&gt; and &lt;code&gt;"platform": "Kubernetes"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Canary requests: All show &lt;code&gt;"platform": "Kubernetes"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📈 &lt;strong&gt;Production Migration Strategy&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now that we have the foundation, here's how you'd execute this in production:&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Initial Deployment (Week 1)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Start conservative: 95% VM, 5% K8s&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;95&lt;/span&gt;  &lt;span class="c1"&gt;# VM&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;   &lt;span class="c1"&gt;# Kubernetes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 2: Confidence Building (Week 2-3)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Increase gradually as metrics look good&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;70&lt;/span&gt;  &lt;span class="c1"&gt;# VM  &lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# Kubernetes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 3: Equal Split Testing (Week 4)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Test at scale with equal traffic&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# VM&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# Kubernetes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 4: Kubernetes Majority (Week 5)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Shift majority to K8s&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;  &lt;span class="c1"&gt;# VM&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;  &lt;span class="c1"&gt;# Kubernetes  &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 5: Migration Complete (Week 6)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Full migration&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;   &lt;span class="c1"&gt;# VM (decommission)&lt;/span&gt;
&lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="c1"&gt;# Kubernetes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🐞 Production Challenges I've Solved
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Problem 1: Database Connection Storms
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; K8s pods create more DB connections than VM&lt;br&gt;
&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DestinationRule&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;trafficPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;connectionPool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;tcp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;maxConnections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
        &lt;span class="na"&gt;connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 2: Session Affinity Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; User sessions break during traffic shifts&lt;br&gt;
&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DestinationRule&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;trafficPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;loadBalancer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;consistentHash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;httpCookieName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSESSIONID"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 3: Instant Rollback Needed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Command for emergency rollback:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl patch virtualservice migration-demo-vs &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
{
  "spec": {
    "http": [{
      "route": [{
        "destination": {
          "host": "migration-demo-service", 
          "subset": "vm"
        },
        "weight": 100
      }]
    }]
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🎯 Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What we accomplished:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Zero downtime migration&lt;/strong&gt; with traffic splitting&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Hybrid VM + Kubernetes&lt;/strong&gt; architecture
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Instant rollback&lt;/strong&gt; capability&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Feature flagging&lt;/strong&gt; with header-based routing&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Production-ready&lt;/strong&gt; traffic management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This approach is used by:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Netflix (microservices migration)&lt;/li&gt;
&lt;li&gt;Spotify (platform modernization)
&lt;/li&gt;
&lt;li&gt;Airbnb (infrastructure consolidation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💬 &lt;strong&gt;Let's Connect&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you implement this migration strategy or face any challenges, I'd love to hear about it!&lt;/p&gt;

&lt;p&gt;GitHub → &lt;a href="https://github.com/sumitroyyy/Zero-Down-time-migration-to-k8s/tree/main" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
LinkedIn → &lt;a href="https://www.linkedin.com/in/sumit-roy-299476150/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
X (Twitter) → &lt;a href="https://x.com/Royy9007" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Found this helpful?&lt;/strong&gt; Drop a ⭐ - it motivates me to write more production battle stories like this!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Questions about your specific migration scenario?&lt;/strong&gt; Let's discuss in the comments below. Every legacy system has unique challenges, and I've probably faced something similar!&lt;/p&gt;




</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>opensource</category>
    </item>
    <item>
      <title>📊 Adding Observability to Gemma 2B on Kubernetes with Prometheus &amp; Grafana</title>
      <dc:creator>Sumit Roy</dc:creator>
      <pubDate>Fri, 19 Sep 2025 23:09:04 +0000</pubDate>
      <link>https://forem.com/sumit_roy9007/adding-observability-to-gemma-2b-on-kubernetes-with-prometheus-grafana-4aj</link>
      <guid>https://forem.com/sumit_roy9007/adding-observability-to-gemma-2b-on-kubernetes-with-prometheus-grafana-4aj</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/sumit_roy9007/running-gemma-2b-on-kubernetes-k3d-with-ollama-a-complete-local-ai-setup-2knp"&gt;Article 1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we first set up Prometheus + Grafana for Gemma 2B on Kubernetes, I expected to see nice dashboards with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens per request&lt;/li&gt;
&lt;li&gt;Latency per inference&lt;/li&gt;
&lt;li&gt;Number of inferences processed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…but all we got were boring container metrics: CPU%, memory usage, restarts.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sure, they told us the pod was alive, but nothing about the model itself.&lt;br&gt;
No clue if inference was slow, if requests were timing out, or how many tokens were processed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🔍 &lt;strong&gt;Debugging the Metrics Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We checked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus scraping the Ollama pod? ✅&lt;/li&gt;
&lt;li&gt;Grafana dashboards connected? ✅&lt;/li&gt;
&lt;li&gt;Metrics endpoint on Ollama? ❌&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s when we realized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama by default doesn’t expose model-level metrics.&lt;/li&gt;
&lt;li&gt;It only serves the API for inference, nothing else.&lt;/li&gt;
&lt;li&gt;Prometheus was scraping… nothing useful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 &lt;strong&gt;The Fix: Ollama Exporter as Sidecar&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While digging through GitHub issues, we found a project: &lt;strong&gt;Ollama Exporter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It runs as a sidecar container inside the same pod as Ollama, talks to the Ollama API, and exposes real metrics at /metrics for Prometheus.&lt;/p&gt;

&lt;p&gt;Basically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ Ollama Pod ]
    ├── Ollama Server (API → 11434)
    └── Ollama Exporter (Metrics → 11435)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🛠 &lt;strong&gt;How We Integrated It&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here’s the snippet we added to the Ollama deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: ollama-exporter
  image: ghcr.io/jmorganca/ollama-exporter:latest
  ports:
    - containerPort: 11435
  env:
    - name: OLLAMA_HOST
      value: "http://localhost:11434"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in Prometheus config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scrape_configs:
  - job_name: 'ollama'
    static_configs:
      - targets: ['ollama-service:11435']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📊 &lt;strong&gt;The Metrics We Finally Got&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After adding the exporter, Grafana lit up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Metric Name         What It Shows
ollama_requests_total   Number of inference requests
ollama_latency_seconds  Latency per inference request
ollama_tokens_processed Tokens processed per inference
ollama_model_load_time  Time taken to load Gemma 2B model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Suddenly, we had real model observability, not just pod health.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Lessons Learned&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default Kubernetes metrics ≠ Model metrics → You need a sidecar like Ollama Exporter.&lt;/li&gt;
&lt;li&gt;One scrape job away → Prometheus won’t scrape what you don’t tell it to.&lt;/li&gt;
&lt;li&gt;Metrics help tuning → We later used these metrics to set CPU/memory requests properl&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔮 &lt;strong&gt;What’s Next?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Now that we have model-level observability, the next step is:&lt;/li&gt;
&lt;li&gt;Adding alerting rules for latency spikes or token errors.&lt;/li&gt;
&lt;li&gt;Exporting historical metrics into long-term storage (e.g., Loki, Thanos).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying multiple models &lt;code&gt;Gemma 3, LLaMA 3, Phi-3&lt;/code&gt; and comparing inference latency across them.&lt;/p&gt;

&lt;p&gt;💬 &lt;strong&gt;Let’s Connect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you try this setup or improve it, I’d love to hear from you!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub → &lt;a href="https://github.com/sumitroyyy/Ollama_gemma3_k3d_Observability" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn → &lt;a href="https://www.linkedin.com/in/sumit-roy-299476150/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;X (Twitter) → &lt;a href="https://x.com/Royy9007" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Running Gemma 2B on Kubernetes (k3d) with Ollama: A Complete Local AI Setup</title>
      <dc:creator>Sumit Roy</dc:creator>
      <pubDate>Fri, 19 Sep 2025 22:50:19 +0000</pubDate>
      <link>https://forem.com/sumit_roy9007/running-gemma-2b-on-kubernetes-k3d-with-ollama-a-complete-local-ai-setup-2knp</link>
      <guid>https://forem.com/sumit_roy9007/running-gemma-2b-on-kubernetes-k3d-with-ollama-a-complete-local-ai-setup-2knp</guid>
      <description>&lt;p&gt;I was fascinated by how people were running large language models locally, fully offline, without depending on expensive GPU clusters or cloud APIs.&lt;/p&gt;

&lt;p&gt;But when I tried deploying Gemma 2B manually on my machine, the process was messy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large model weights needed downloading&lt;/li&gt;
&lt;li&gt;Restarting the container meant re-downloading everything&lt;/li&gt;
&lt;li&gt;No orchestration or resilience — if the container died, my setup was gone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, I asked myself:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Can I run Gemma 2B efficiently, fully containerized, orchestrated by Kubernetes, with a clean local setup?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer: Yes. Using k3d + Ollama + Kubernetes + Gemma 2B.&lt;/p&gt;

&lt;p&gt;🎯 &lt;strong&gt;What You’ll Learn&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy Gemma 2B using Ollama inside a k3d Kubernetes cluster&lt;/li&gt;
&lt;li&gt;Expose it via a service for local access&lt;/li&gt;
&lt;li&gt;Persist model weights to avoid re-downloading&lt;/li&gt;
&lt;li&gt;Basic troubleshooting for pods and containers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;🛠️ &lt;strong&gt;Tech Stack&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;k3d           Lightweight Kubernetes cluster inside Docker&lt;/li&gt;
&lt;li&gt;Ollama    Container for running LLMs locally&lt;/li&gt;
&lt;li&gt;Gemma 2B  Lightweight LLM (~1.7GB) from Google, runs locally&lt;/li&gt;
&lt;li&gt;WSL2          Linux environment on Windows &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📚 &lt;strong&gt;Concepts Before We Start&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What is Ollama&lt;/strong&gt;?&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Ollama is a simple tool for running LLMs locally:&lt;/li&gt;
&lt;li&gt;Pulls models like Gemma, Llama, Phi&lt;/li&gt;
&lt;li&gt;Provides a REST API for inference&lt;/li&gt;
&lt;li&gt;Runs entirely offline once weights are downloaded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ollama run gemma:2b&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Gives you a local chatbot with zero cloud dependency.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Why Kubernetes (k3d)?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of running Ollama bare-metal, we use k3d:&lt;/p&gt;

&lt;p&gt;Local K8s Cluster → k3d runs Kubernetes inside Docker, very lightweight&lt;/p&gt;

&lt;p&gt;Pods &amp;amp; PVCs → Pods run containers, PVCs store model weights&lt;/p&gt;

&lt;p&gt;Services → Expose Ollama API on localhost easily&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Storage with PVC&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without PVCs, if your pod dies, you lose model weights.&lt;br&gt;
PVC ensures models survive restarts and redeployments.&lt;/p&gt;

&lt;p&gt;🧑‍💻 &lt;strong&gt;Step-by-Step Setup&lt;/strong&gt;&lt;br&gt;
Step 1: &lt;code&gt;Install k3d&lt;/code&gt;&lt;br&gt;
&lt;code&gt;curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash&lt;/code&gt;&lt;br&gt;
&lt;code&gt;k3d cluster create gemma-cluster --agents 1 --servers 1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Step 2: &lt;strong&gt;Deploy Ollama + Gemma 2B&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create ollama-deployment.yaml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: model-storage
          mountPath: /root/.ollama
      volumes:
      - name: model-storage
        persistentVolumeClaim:
          claimName: ollama-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
  type: LoadBalancer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Apply it:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl apply -f ollama-deployment.yaml&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Step 3: &lt;strong&gt;Pull Gemma 2B Model&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;kubectl exec -it deploy/ollama -- ollama pull gemma:2b&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Step 4: &lt;strong&gt;Test the API&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt": "Write a short poem about Kubernetes"
}'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🐞 &lt;strong&gt;Problems I Faced &amp;amp; Fixes&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pod in CrashLoopBackOff  Increased CPU/RAM in deployment spec&lt;/li&gt;
&lt;li&gt;Model re-downloading on restart  Used PVC to persist weights&lt;/li&gt;
&lt;li&gt;Port not accessible  Used LoadBalancer + k3d port mapping&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;📂 Final Project Structure&lt;br&gt;
gemma-k3d/&lt;br&gt;
├── ollama-deployment.yaml&lt;br&gt;
├── k3d-cluster-setup.sh&lt;br&gt;
└── README.md&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Next Steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the next article, we’ll add Prometheus + Grafana to monitor:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CPU usage&lt;/li&gt;
&lt;li&gt;Memory usage&lt;/li&gt;
&lt;li&gt;Latency per inference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;💬 &lt;strong&gt;Let’s Connect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you try this setup or improve it, I’d love to hear from you!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub → &lt;a href="https://github.com/sumitroyyy/Ollama_gemma3_k3d_Observability" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn → &lt;a href="https://www.linkedin.com/in/sumit-roy-299476150/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;X (Twitter) → &lt;a href="https://x.com/Royy9007" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
