Forem: Sumit Roy

Zero-Downtime VM to Kubernetes Migration with Istio: A Complete Production Guide

Sumit Roy — Mon, 22 Sep 2025 02:37:50 +0000

I was troubleshooting a failed migration for one of my previous projects, watching our legacy service crash as we tried moving it from VMs to Kubernetes. The traditional 'maintenance window and hope' approach wasn't working.

That's when I discovered something magical: hybrid deployments with Istio service mesh. The ability to run applications on both VMs and Kubernetes simultaneously, gradually shifting traffic with zero downtime.

So, I asked myself:
"Can I migrate legacy applications from VMs to Kubernetes without any service interruption, while having full control over traffic routing and the ability to instantly rollback?"

The answer: Absolutely. Using k8s + Istio + WorkloadEntry + Canary Deployments.

Here i have stimulated the exact approach on my local

🎯 What You'll Learn

Set up a hybrid VM + Kubernetes deployment using Istio service mesh
Register VM applications in Kubernetes service discovery with WorkloadEntry
Implement canary deployments with intelligent traffic splitting
Master production-grade migration strategies with instant rollback capabilities
Handle real-world migration challenges and troubleshooting

🛠️ Tech Stack

k3d - Lightweight Kubernetes cluster for local development
Istio - Service mesh for traffic management and observability
WorkloadEntry - Register VM workloads in Kubernetes service registry
ServiceEntry - Define external services in the mesh
VirtualService - Advanced traffic routing and canary deployments
Node.js - Sample application (easily replaceable with any tech stack)

📚 The Migration Challenge

Why Traditional Migration Fails

Most organizations attempt migrations like this:

Maintenance Window → Schedule downtime (expensive!)
Pray and Deploy → Deploy new version, hope nothing breaks
All or Nothing → 100% traffic shift immediately
Panic Mode → When things go wrong, scramble to rollback

Result: Sleepless nights, angry customers, and failed projects.

The Istio Solution

Instead of switching instantly, we create a hybrid architecture:

VM Application serves 80% of traffic initially
Kubernetes Application serves 20% of traffic (canary)
Gradual Migration → Shift from 80/20 → 50/50 → 20/80 → 0/100
Instant Rollback → One command reverts all traffic to VM

🧑‍💻 Building Our Migration Lab

Let's simulate a real-world scenario where we migrate a Node.js API from a VM to Kubernetes.

Step 1: Create the "Legacy" VM Application

First, let's build our legacy application that's currently running on a VM:

mkdir migration-demo
cd migration-demo

Create a simple Node.js API

cat > app.js << 'EOF'
const express = require('express');
const os = require('os');
const app = express();
const port = 3000;

app.get('/', (req, res) => {
    res.json({
        message: 'Hello from Migration Demo!',
        hostname: os.hostname(),
        platform: 'VM',
        timestamp: new Date().toISOString(),
        version: 'v1.0'
    });
});

app.get('/health', (req, res) => {
    res.json({ status: 'healthy' });
});

app.listen(port, '0.0.0.0', () => {
    console.log(`App running on port ${port}`);
});
EOF

Create package.json

cat > package.json << 'EOF'
{
  "name": "migration-demo",
  "version": "1.0.0",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "dependencies": {
    "express": "^4.18.2"
  }
}
EOF

Install and run our "VM" application:

# Install Node.js and dependencies
`npm install`

# Start the VM application
`npm start &`

# Test it's working
`curl http://localhost:3000`

You should see:

{
  "message": "Hello from Migration Demo!",
  "hostname": "your-machine",
  "platform": "VM",
  "timestamp": "2024-01-27T...",
  "version": "v1.0"
}

This is our legacy application running on the "VM".

Step 2: Containerize for Kubernetes

Now let's prepare the same application for Kubernetes:

bash

Create Dockerfile

cat > Dockerfile << 'EOF'
FROM node:18-alpine

WORKDIR /app
COPY package*.json ./
RUN npm install --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
EOF

# Build Docker image
docker build -t migration-demo:v1.0 .

Step 3: Set Up Kubernetes Cluster

bash

Create k3d cluster with port mappings

k3d cluster create migration-cluster \ --port "8080:80@loadbalancer" \ --port "8443:443@loadbalancer" \ --agents 2

Load our image into the cluster

k3d image import migration-demo:v1.0 -c migration-cluster

Verify cluster

kubectl get nodes


### Step 4: **Install Istio Service Mesh**

  bash
# Install Istio
`istioctl install --set values.defaultRevision=default -y`

# Enable automatic sidecar injection
`kubectl label namespace default istio-injection=enabled`

# Verify Istio is running
`kubectl get pods -n istio-system`

Wait until all Istio pods show Running status.

🚀 The Magic: Hybrid Deployment Setup

This is where it gets exciting! We're going to register our VM application with Istio so both VM and Kubernetes versions can coexist.

Step 5: Deploy Kubernetes Version

bash

cat > k8s-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: migration-demo-k8s
  labels:
    app: migration-demo
    version: k8s
spec:
  replicas: 2
  selector:
    matchLabels:
      app: migration-demo
      version: k8s
  template:
    metadata:
      labels:
        app: migration-demo
        version: k8s
    spec:
      containers:
      - name: migration-demo
        image: migration-demo:v1.0
        ports:
        - containerPort: 3000
        env:
        - name: PLATFORM
          value: "Kubernetes"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: migration-demo-service
spec:
  ports:
  - port: 3000
    name: http
  selector:
    app: migration-demo
EOF

kubectl apply -f k8s-deployment.yaml

Wait for pods to be ready:

kubectl get pods
Should show:
migration-demo-k8s-xxx 2/2 Running (2/2 = app + istio-proxy)

Step 6: `Register VM in Service Mesh`

Here's the breakthrough - we register our VM application with Istio using WorkloadEntry:

cat > vm-workloadentry.yaml << 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  name: migration-demo-vm
  namespace: default
spec:
  address: "host.k3d.internal"  # k3d's way to reach host machine
  ports:
    http: 3000
  labels:
    app: migration-demo
    version: vm

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: migration-demo-vm-service
  namespace: default
spec:
  hosts:
  - migration-demo-vm.local
  ports:
  - number: 3000
    name: http
    protocol: HTTP
  location: MESH_EXTERNAL
  resolution: STATIC
  endpoints:
  - address: "host.k3d.internal"
    ports:
      http: 3000
    labels:
      app: migration-demo
      version: vm
EOF

kubectl apply -f vm-workloadentry.yaml
kubectl apply -f vm-serviceEntry.yaml

What just happened?

Our VM application is now part of the Kubernetes service discovery!
Istio can route traffic to both VM and Kubernetes versions
Both applications share the same service name: migration-demo-service

🎛️ Canary Deployment: The Migration Control Panel

Now for the most powerful part - intelligent traffic routing:

Step 7: Configure Traffic Management

bash

Create traffic routing rules

cat > destination-rule.yaml << 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: migration-demo-destination
spec:
  host: migration-demo-service
  subsets:
  - name: vm
    labels:
      version: vm
  - name: k8s
    labels:
      version: k8s
EOF

cat > virtual-service.yaml << 'EOF'
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: migration-demo-vs
spec:
  hosts:
  - migration-demo-service
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: migration-demo-service
        subset: k8s
  - route:
    - destination:
        host: migration-demo-service
        subset: vm
      weight: 80
    - destination:
        host: migration-demo-service
        subset: k8s
      weight: 20
EOF

kubectl apply -f destination-rule.yaml kubectl apply -f virtual-service.yaml

What we just created:

80% traffic goes to VM (safe, proven version)
20% traffic goes to Kubernetes (canary testing)
Feature flag: canary: true header routes 100% to Kubernetes
Instant control: Change weights anytime without deployment

Step 8: Test the Migration in Action

Create a test client:

cat > test-pod.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: test-client
spec:
  containers:
  - name: curl
    image: curlimages/curl:latest
    command: ["/bin/sh"]
    args: ["-c", "while true; do sleep 3600; done"]
EOF

kubectl apply -f test-pod.yaml kubectl wait --for=condition=ready pod/test-client

Test normal traffic distribution:

echo "Testing normal traffic (80% VM, 20% K8s):"
for i in {1..10}; do
  kubectl exec -it test-client -- curl -s http://migration-demo-service:3000 | grep '"platform"'
done

Test canary routing:

echo "Testing canary header (100% K8s):"
for i in {1..5}; do
  kubectl exec -it test-client -- curl -s -H "canary: true" http://migration-demo-service:3000 | grep '"platform"'
done

🎉 You should see:

Normal requests: Mix of "platform": "VM" and "platform": "Kubernetes"
Canary requests: All show "platform": "Kubernetes"

📈 Production Migration Strategy

Now that we have the foundation, here's how you'd execute this in production:

Phase 1: Initial Deployment (Week 1)

# Start conservative: 95% VM, 5% K8s
weight: 95  # VM
weight: 5   # Kubernetes

Phase 2: Confidence Building (Week 2-3)

# Increase gradually as metrics look good
weight: 70  # VM  
weight: 30  # Kubernetes

Phase 3: Equal Split Testing (Week 4)

# Test at scale with equal traffic
weight: 50  # VM
weight: 50  # Kubernetes

Phase 4: Kubernetes Majority (Week 5)

# Shift majority to K8s
weight: 20  # VM
weight: 80  # Kubernetes

Phase 5: Migration Complete (Week 6)

# Full migration
weight: 0   # VM (decommission)
weight: 100 # Kubernetes

🐞 Production Challenges I've Solved

Problem 1: Database Connection Storms

Symptom: K8s pods create more DB connections than VM
Solution:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
spec:
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 50
        connectTimeout: 30s

Problem 2: Session Affinity Issues

Symptom: User sessions break during traffic shifts
Solution:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
spec:
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpCookieName: "JSESSIONID"

Problem 3: Instant Rollback Needed

Command for emergency rollback:

kubectl patch virtualservice migration-demo-vs --type='merge' -p='
{
  "spec": {
    "http": [{
      "route": [{
        "destination": {
          "host": "migration-demo-service", 
          "subset": "vm"
        },
        "weight": 100
      }]
    }]
  }
}'

🎯 Key Takeaways

What we accomplished:

✅ Zero downtime migration with traffic splitting
✅ Hybrid VM + Kubernetes architecture
✅ Instant rollback capability
✅ Feature flagging with header-based routing
✅ Production-ready traffic management

This approach is used by:

Netflix (microservices migration)
Spotify (platform modernization)
Airbnb (infrastructure consolidation)

💬 Let's Connect

If you implement this migration strategy or face any challenges, I'd love to hear about it!

GitHub → GitHub
LinkedIn → LinkedIn
X (Twitter) → Twitter

Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!

Found this helpful? Drop a ⭐ - it motivates me to write more production battle stories like this!

Questions about your specific migration scenario? Let's discuss in the comments below. Every legacy system has unique challenges, and I've probably faced something similar!

📊 Adding Observability to Gemma 2B on Kubernetes with Prometheus & Grafana

Sumit Roy — Fri, 19 Sep 2025 23:09:04 +0000

In Article 1

When we first set up Prometheus + Grafana for Gemma 2B on Kubernetes, I expected to see nice dashboards with:

Tokens per request
Latency per inference
Number of inferences processed

…but all we got were boring container metrics: CPU%, memory usage, restarts.

Sure, they told us the pod was alive, but nothing about the model itself.
No clue if inference was slow, if requests were timing out, or how many tokens were processed.

🔍 Debugging the Metrics Problem

We checked:

Prometheus scraping the Ollama pod? ✅
Grafana dashboards connected? ✅
Metrics endpoint on Ollama? ❌

That’s when we realized:

Ollama by default doesn’t expose model-level metrics.
It only serves the API for inference, nothing else.
Prometheus was scraping… nothing useful.

💡 The Fix: Ollama Exporter as Sidecar

While digging through GitHub issues, we found a project: Ollama Exporter

It runs as a sidecar container inside the same pod as Ollama, talks to the Ollama API, and exposes real metrics at /metrics for Prometheus.

Basically:

[ Ollama Pod ]
    ├── Ollama Server (API → 11434)
    └── Ollama Exporter (Metrics → 11435)

🛠 How We Integrated It

Here’s the snippet we added to the Ollama deployment:

- name: ollama-exporter
  image: ghcr.io/jmorganca/ollama-exporter:latest
  ports:
    - containerPort: 11435
  env:
    - name: OLLAMA_HOST
      value: "http://localhost:11434"

And in Prometheus config:

scrape_configs:
  - job_name: 'ollama'
    static_configs:
      - targets: ['ollama-service:11435']

📊 The Metrics We Finally Got

After adding the exporter, Grafana lit up with:

Metric Name         What It Shows
ollama_requests_total   Number of inference requests
ollama_latency_seconds  Latency per inference request
ollama_tokens_processed Tokens processed per inference
ollama_model_load_time  Time taken to load Gemma 2B model

Suddenly, we had real model observability, not just pod health.

🚀 Lessons Learned

Default Kubernetes metrics ≠ Model metrics → You need a sidecar like Ollama Exporter.
One scrape job away → Prometheus won’t scrape what you don’t tell it to.
Metrics help tuning → We later used these metrics to set CPU/memory requests properl

🔮 What’s Next?

Now that we have model-level observability, the next step is:
Adding alerting rules for latency spikes or token errors.
Exporting historical metrics into long-term storage (e.g., Loki, Thanos).

Trying multiple models Gemma 3, LLaMA 3, Phi-3 and comparing inference latency across them.

💬 Let’s Connect

If you try this setup or improve it, I’d love to hear from you!

GitHub → GitHub
LinkedIn → LinkedIn
X (Twitter) → Twitter

Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!

Running Gemma 2B on Kubernetes (k3d) with Ollama: A Complete Local AI Setup

Sumit Roy — Fri, 19 Sep 2025 22:50:19 +0000

I was fascinated by how people were running large language models locally, fully offline, without depending on expensive GPU clusters or cloud APIs.

But when I tried deploying Gemma 2B manually on my machine, the process was messy:

Large model weights needed downloading
Restarting the container meant re-downloading everything
No orchestration or resilience — if the container died, my setup was gone

So, I asked myself:

“Can I run Gemma 2B efficiently, fully containerized, orchestrated by Kubernetes, with a clean local setup?”

The answer: Yes. Using k3d + Ollama + Kubernetes + Gemma 2B.

🎯 What You’ll Learn

Deploy Gemma 2B using Ollama inside a k3d Kubernetes cluster
Expose it via a service for local access
Persist model weights to avoid re-downloading
Basic troubleshooting for pods and containers

🛠️ Tech Stack

k3d Lightweight Kubernetes cluster inside Docker
Ollama Container for running LLMs locally
Gemma 2B Lightweight LLM (~1.7GB) from Google, runs locally
WSL2 Linux environment on Windows

📚 Concepts Before We Start

What is Ollama?

Ollama is a simple tool for running LLMs locally:
Pulls models like Gemma, Llama, Phi
Provides a REST API for inference
Runs entirely offline once weights are downloaded

Example:

ollama run gemma:2b

Gives you a local chatbot with zero cloud dependency.

Why Kubernetes (k3d)?

Instead of running Ollama bare-metal, we use k3d:

Local K8s Cluster → k3d runs Kubernetes inside Docker, very lightweight

Pods & PVCs → Pods run containers, PVCs store model weights

Services → Expose Ollama API on localhost easily

Storage with PVC

Without PVCs, if your pod dies, you lose model weights.
PVC ensures models survive restarts and redeployments.

🧑‍💻 Step-by-Step Setup
Step 1: Install k3d
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
k3d cluster create gemma-cluster --agents 1 --servers 1

Step 2: Deploy Ollama + Gemma 2B

Create ollama-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: model-storage
          mountPath: /root/.ollama
      volumes:
      - name: model-storage
        persistentVolumeClaim:
          claimName: ollama-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
  type: LoadBalancer

Apply it:

kubectl apply -f ollama-deployment.yaml

Step 3: Pull Gemma 2B Model
kubectl exec -it deploy/ollama -- ollama pull gemma:2b

Step 4: Test the API

curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt": "Write a short poem about Kubernetes"
}'

🐞 Problems I Faced & Fixes

Pod in CrashLoopBackOff Increased CPU/RAM in deployment spec
Model re-downloading on restart Used PVC to persist weights
Port not accessible Used LoadBalancer + k3d port mapping

📂 Final Project Structure
gemma-k3d/
├── ollama-deployment.yaml
├── k3d-cluster-setup.sh
└── README.md

🚀 Next Steps

In the next article, we’ll add Prometheus + Grafana to monitor:

CPU usage
Memory usage
Latency per inference

💬 Let’s Connect

If you try this setup or improve it, I’d love to hear from you!

GitHub → GitHub
LinkedIn → LinkedIn
X (Twitter) → Twitter

Drop a star ⭐ on the repo if it helped you — it keeps me motivated to write more experiments like this!

Forem: Sumit Roy

Zero-Downtime VM to Kubernetes Migration with Istio: A Complete Production Guide

🎯 What You'll Learn

🛠️ Tech Stack

📚 The Migration Challenge

Why Traditional Migration Fails

The Istio Solution

🧑‍💻 Building Our Migration Lab

Step 1: Create the "Legacy" VM Application

Create a simple Node.js API

Create package.json

Step 2: Containerize for Kubernetes

Create Dockerfile

Step 3: Set Up Kubernetes Cluster

Create k3d cluster with port mappings

Load our image into the cluster

Verify cluster

🚀 The Magic: Hybrid Deployment Setup

Step 5: Deploy Kubernetes Version

Step 6: Register VM in Service Mesh

🎛️ Canary Deployment: The Migration Control Panel

Step 7: Configure Traffic Management

Create traffic routing rules

Step 8: Test the Migration in Action

📈 Production Migration Strategy

Phase 1: Initial Deployment (Week 1)

Phase 2: Confidence Building (Week 2-3)

Phase 3: Equal Split Testing (Week 4)

Phase 4: Kubernetes Majority (Week 5)

Phase 5: Migration Complete (Week 6)

🐞 Production Challenges I've Solved

Problem 1: Database Connection Storms

Problem 2: Session Affinity Issues

Problem 3: Instant Rollback Needed

🎯 Key Takeaways

💬 Let's Connect

📊 Adding Observability to Gemma 2B on Kubernetes with Prometheus & Grafana

Running Gemma 2B on Kubernetes (k3d) with Ollama: A Complete Local AI Setup

Step 6: `Register VM in Service Mesh`