Forem: Anupam Kushwaha

Why My Microservices Broke on OpenShift — And How a Hidden Kubernetes Quota Nearly Cost Me Days

Anupam Kushwaha — Wed, 06 May 2026 21:17:50 +0000

Deploying four Spring Boot microservices to OpenShift Developer Sandbox, I hit two silent failures — a ReplicaSet quota exhaustion and a gateway routing to localhost. Here is the full debugging story.

If you're deploying microservices on OpenShift's free Developer Sandbox (or any resource-constrained Kubernetes cluster), this post might save you hours of debugging.

The Setup

I built a production-grade mobile application backed by a microservices architecture — four Spring Boot services deployed to Red Hat OpenShift Developer Sandbox via a fully automated GitHub Actions CI/CD pipeline.

The stack:

Flutter mobile frontend (automated APK releases)
API Gateway (Spring Cloud Gateway) — single entry point for all client requests
Auth Service — handles registration, login, OTP verification, JWT tokens
User Service — user profiles, preferences, settings
Core Service — main business logic, AI features, data processing
MongoDB Atlas — separate databases per service
GitHub Container Registry (GHCR) — Docker image hosting
OpenShift Developer Sandbox — free-tier Kubernetes hosting

Everything was containerized, secrets-managed, health-probed, and CI/CD automated. It worked flawlessly on localhost. Then I deployed it.

It broke. For two days.

The Architecture

Here is how the system is wired:

The mobile app hits the API Gateway via an OpenShift Route (HTTPS). The gateway reads the URL path and forwards it to the correct internal microservice via Kubernetes Service DNS names (e.g., http://app-auth-svc:8080).

The CI/CD Pipeline

Every push to main triggers a GitHub Actions workflow that:

Builds all 4 services with Maven
Creates Docker images and pushes to GHCR
Logs into OpenShift via CLI (oc login)
Creates/updates Kubernetes secrets (MongoDB URIs, JWT secret, API keys)
Applies all deployment manifests
Runs oc rollout restart on each deployment
Waits for health checks to pass

Sounds bulletproof, right? Here is where it fell apart.

The Symptom

After deploying, the app showed one message on every action — registration, login, anything:

"Something went wrong. Please try again later."

The classic generic error that tells you absolutely nothing.

Bug #1: The Silent Quota Killer

What I Saw

The CI/CD pipeline failed with this in the logs:

AuthServiceApplication - Started AuthServiceApplication in 36.106 seconds
...
Error from server (BadRequest): previous terminated container "app-auth" 
in pod "app-auth-xxxxx-xxxxx" not found
Error: Process completed with exit code 1.

Confusing, right? The auth service clearly started successfully (36 seconds, listening on port 8080). But the deployment was marked as failed.

Digging Deeper

Looking at the pod events, I found:

Readiness probe failed: Get "http://10.x.x.x:8080/actuator/health": 
  connection refused

And buried further down, the real error:

replicasets.apps is forbidden: exceeded quota

What Actually Happened

Here is what most people do not know about Kubernetes deployments:

Every time you run oc rollout restart (or kubectl rollout restart), Kubernetes does not just restart your pods. It creates an entirely new ReplicaSet while keeping the old ones around as rollback history.

By default, Kubernetes keeps the last 10 ReplicaSets per deployment (controlled by revisionHistoryLimit, which defaults to 10).

Now multiply that by 4 microservices:

4 services × 10 ReplicaSets = 40 ReplicaSets

The OpenShift Developer Sandbox (free tier) has a strict quota on the total number of ReplicaSets allowed in your namespace. After just a few CI/CD runs, I silently hit that ceiling.

When the quota is exceeded:

Kubernetes cannot create new ReplicaSets for the rollout
No new ReplicaSet = no new pods get scheduled
No pods = readiness probe has nothing to connect to → connection refused
Rollout waits... and eventually times out → context deadline exceeded
Pipeline fails with exit code 1

The app code was perfectly fine. Kubernetes just silently refused to create pods.

The Fix

There are two approaches, and I recommend using both:

Fix A: Set `revisionHistoryLimit` in Your Deployments (Best Practice)

Add revisionHistoryLimit: 1 to every deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
spec:
  revisionHistoryLimit: 1    # Only keep 1 old ReplicaSet
  replicas: 1
  selector:
    matchLabels:
      app: my-service
  # ... rest of your spec

Why 1 and not 0? Setting it to 0 means Kubernetes keeps zero rollback history. If a bad deployment goes out, you cannot do oc rollout undo to instantly revert. Keeping 1 gives you exactly one rollback point — enough for safety without wasting quota. This is the best practice because if anything goes wrong with a new deployment, you still have an instant rollback option.

With 4 services at revisionHistoryLimit: 1:

4 services × (1 current + 1 old) = 8 ReplicaSets — well within any quota.

Fix B: Add Cleanup to Your Deploy Script (Recovery Safety Net)

Add this before the rollout restart commands in your deployment script:

# Clean up old ReplicaSets to avoid quota issues on free-tier clusters
echo "Cleaning up old ReplicaSets..."
for dep in app-auth app-user app-core app-gateway; do
  # Get all ReplicaSets for this deployment, sorted oldest first
  # Delete all except the most recent one
  OLD_RS=$(oc get rs -l "app=$dep" \
    --sort-by=.metadata.creationTimestamp \
    -o name 2>/dev/null | head -n -1)
  if [ -n "$OLD_RS" ]; then
    echo "$OLD_RS" | xargs oc delete
    echo "  Cleaned old ReplicaSets for $dep"
  fi
done

Fix B is useful for one-time recovery when you have already hit the quota, or as a safety net alongside Fix A. But Fix A is the real solution — it is declarative, permanent, and prevents the problem from ever occurring again.

Bug #2: The Gateway That Routed to Itself

Even after fixing the quota issue and getting all pods running, the app still did not work. Registration still failed.

The Clue

I hit the gateway health endpoint directly:

curl https://my-gateway-route.apps.openshiftapps.com/actuator/health
# → 200 OK ✅

Gateway was healthy. But hitting an actual API route:

curl https://my-gateway-route.apps.openshiftapps.com/api/auth/signup
# → 502 Bad Gateway ❌

The Problem

My API Gateway's application.yml had hardcoded localhost URLs for routing:

spring:
  cloud:
    gateway:
      routes:
        - id: auth-service
          uri: http://localhost:7071        # Works on my laptop
          predicates:
            - Path=/api/auth/**

        - id: user-service
          uri: http://localhost:7072        # Works on my laptop
          predicates:
            - Path=/api/users/**

        - id: core-service
          uri: http://localhost:7073        # Works on my laptop
          predicates:
            - Path=/api/core/**

On my machine, all 4 services run on the same host (localhost) on different ports. It works.

On Kubernetes, each service runs in a separate pod with its own network namespace. localhost:7071 inside the gateway pod is just... the gateway pod itself. There is nothing listening on port 7071 there.

The Irony

My deploy script already created the correct internal URLs as Kubernetes secrets:

oc create secret generic app-secrets \
  --from-literal=AUTH_SERVICE_URL="http://app-auth-svc:8080" \
  --from-literal=USER_SERVICE_URL="http://app-user-svc:8080" \
  --from-literal=CORE_SERVICE_URL="http://app-core-svc:8080"

And my other services correctly used them:

# core-service application.yml — Correct
app:
  user-service:
    base-url: ${USER_SERVICE_URL:http://localhost:7072}

Only the gateway was missed. The env vars were injected into the pod but never referenced in the routing config.

The Fix

Replace hardcoded URLs with environment variable references (with localhost as the default for local development):

spring:
  cloud:
    gateway:
      routes:
        - id: auth-service
          uri: ${AUTH_SERVICE_URL:http://localhost:7071}
          predicates:
            - Path=/api/auth/**

        - id: user-service
          uri: ${USER_SERVICE_URL:http://localhost:7072}
          predicates:
            - Path=/api/users/**

        - id: core-service
          uri: ${CORE_SERVICE_URL:http://localhost:7073}
          predicates:
            - Path=/api/core/**

The ${ENV_VAR:default} syntax means:

On Kubernetes: uses the injected secret value → http://app-auth-svc:8080
On localhost: falls back to the default → http://localhost:7071

One config, works everywhere.

The Complete Debugging Checklist

If your microservices work locally but fail on OpenShift/Kubernetes, run through this:

#	Check	Command
1	Are pods actually running?	`oc get pods`
2	Are readiness probes passing?	`oc describe pod <pod-name>`
3	Can the pod start at all?	`oc logs <pod-name>`
4	Is there a quota issue?	`oc describe quota`
5	How many ReplicaSets exist?	`oc get rs`
6	Is the gateway routing correctly?	`curl <gateway-url>/actuator/health`
7	Are env vars injected properly?	`oc exec -- env \
8	Is the service DNS resolving?	{% raw %}`oc exec <gateway-pod> -- curl http://app-auth-svc:8080/actuator/health`

Lessons Learned

1. "It works on my machine" extends to Kubernetes

localhost routing is the microservice equivalent of "works on my machine." Always use environment variables with sensible defaults so the same config works in both environments.

2. Kubernetes fails silently in ways you do not expect

The ReplicaSet quota error did not crash my app. It did not log a warning. It just silently prevented new pods from being created, and the symptoms (readiness probe failure, connection refused) pointed you in the completely wrong direction.

3. Free-tier clusters have hidden constraints

OpenShift Developer Sandbox, Google Cloud free tier, Azure free tier — they all have resource quotas that do not exist in your local Minikube or Docker Desktop Kubernetes. Always run oc describe quota (or kubectl describe quota) in your namespace to know your limits.

4. Set `revisionHistoryLimit` from day one

Do not wait until you hit the quota. Add revisionHistoryLimit: 1 to every deployment manifest as a standard practice. It keeps your cluster clean, stays within quotas, and still gives you one rollback point for safety.

5. CI/CD amplifies configuration bugs

When you deploy manually, you might catch issues because you are watching the logs. When CI/CD deploys automatically on every push, a configuration bug silently breaks production while you are still writing code, thinking everything is fine.

Summary

Bug	Root Cause	Fix
Pods not starting	ReplicaSet quota exceeded from accumulated rollout history	Set `revisionHistoryLimit: 1` in deployment manifests
Gateway 502	Route URIs hardcoded to `localhost`	Use `${ENV_VAR:localhost}` pattern in gateway config

If you are deploying microservices on a free-tier Kubernetes cluster and your deployments mysteriously stop working after a few CI/CD runs — check your ReplicaSet count. That silent quota limit is probably the culprit.

Have you hit weird Kubernetes issues on free-tier clusters? I would love to hear about them — connect with me on LinkedIn or check out more on anupamkushwaha.me. And here is the full blog Link

How I cut AI calls by 95% without losing quality?

Anupam Kushwaha — Tue, 05 May 2026 14:03:08 +0000

The Hidden Cost of Calling AI Too Early

I stopped calling AI on every request — and everything got better.

The Problem

In one of my projects, I was generating AI-based insights from user activity.

The initial design was simple:

Every request for today’s insight → call the AI model → return a fresh response.

GET /api/insights/today

At first, this felt clean and correct.

But in practice, it created serious problems:

429 rate limit errors within hours
Daily quota exhausted before noon
Random failures affecting users
Costs scaling linearly with traffic

The system was working — but it wasn’t sustainable.

The Real Issue

The problem wasn’t the AI provider.

It was the trigger model.

The system never asked basic questions before making an expensive call:

Has anything actually changed?
Did I already generate a response recently?
Is the user even active today?

Without these checks, every request was treated as:

“Generate a new insight now.”

That assumption was the real bug.

The New Approach

Instead of adding caching on top, I redesigned the system into an event-driven pipeline.

AI became the last step, not the default.

System Flow

Here’s the simplified request flow:

flowchart TD
    A[Request for today's insight] --> B{Activity today?}
    B -- No --> C[Reuse latest insight or fallback]
    B -- Yes --> D{Meaningful change?}
    D -- No --> C
    D -- Yes --> E{Cooldown passed?}
    E -- No --> C
    E -- Yes --> F{Daily cap reached?}
    F -- Yes --> C
    F -- No --> G{Global AI limit reached?}
    G -- Yes --> H[Use deterministic fallback]
    G -- No --> I[Call AI model]
    I --> J[Persist insight]
    H --> J
    C --> J

Most requests now end at a simple database read — not an AI call.

(Optional) System Screenshot

Add your architecture / sequence diagram here

![System Flow](your-image-url-or-upload)

The Five-Layer Redesign

1. Activity Gate

Start with the cheapest check:

boolean hasActivity = activityService.hasActivityToday(userId, context);

if (!hasActivity) {
    return getLatestOrFallback(userId, today);
}

If nothing happened → don’t call AI.

2. Event-Driven Triggers

AI should only run when something meaningful changes.

Examples:

user updates intent
significant behavior change
threshold crossed

No change → reuse previous insight.

3. Cooldown Window

Avoid frequent re-generation:

Duration cooldown = Duration.ofMinutes(30);

if (elapsed < cooldown) {
    return getLatestOrFallback(userId, today);
}

This prevents unnecessary repeated calls.

4. Per-User Daily Cap

if (todayCount >= 10) {
    return getLatestOrFallback(userId, today);
}

Even active users shouldn’t trigger unlimited AI calls.

5. Global AI Guard

if (dailyAiCalls.get() >= 50) {
    useFallback = true;
}

This acts as a system-wide circuit breaker.

Configuration

All thresholds are configurable:

insight:
  activity-delta: 30
  cooldown-minutes: 30
  daily-cap-per-user: 10
  max-ai-calls-per-day: 50
  freshness-window-hours: 8

This allows tuning without redeploying code.

What Changed

After this redesign:

AI calls dropped from ~100/day → ~5–10/day
Rate limit errors disappeared
Most requests became fast database reads
Free-tier usage became sustainable
System behavior became more predictable

Engineering Takeaway

AI should be the exception, not the rule.

A well-designed backend should first decide:

“Is this request even worth sending to the model?”

That decision layer — gating, triggers, cooldowns — is where the real engineering happens.

Final Thought

If most requests can be handled using deterministic logic or cached state:

Do that first.

Use AI only when it actually adds value.

That single shift can make your system:

cheaper
faster
more reliable

—and much easier to scale.

## blog link -
https://anupamkushwaha.me/blog/stopped-calling-ai-on-every-request

From Scaffolding to Debugging: Spring Boot with GitHub Copilot CLI

Anupam Kushwaha — Mon, 09 Feb 2026 18:33:58 +0000

This is a submission for the GitHub Copilot CLI Challenge

What I Built

I built a production-ready Spring Boot REST API secured with JWT authentication.

The application includes:

User registration and authentication
Stateless JWT-based authorization
Task management (CRUD operations)
Layered architecture (controller, service, repository, entity, DTO)
Multi-database support:
- H2 for local development
- PostgreSQL for production readiness

This project explores whether GitHub Copilot CLI can act as a real engineering assistant — not just a code generator — while building a secure backend following real-world Spring Boot best practices.

Demo

🔗 Repository:

https://github.com/anupamkushwaha85/copilot-cli-springboot-jwt

Screenshots

Project scaffolding with GitHub Copilot CLI

JWT security and service layer generation

Debugging Spring Security (403 Forbidden issue)

API testing with Postman

Protected endpoints in action

My Experience with GitHub Copilot CLI

GitHub Copilot CLI was used throughout the entire development lifecycle — not just for initial scaffolding.

Copilot CLI assisted with:

Generating the complete project structure and Maven configuration
Creating entities, repositories, services, controllers, and DTOs
Implementing JWT authentication and Spring Security configuration
Adapting the application for H2 (development) and PostgreSQL (production)
Generating documentation alongside the code

Real Debugging with Copilot CLI

One of the most valuable moments was debugging a 403 Forbidden error on the authentication endpoints.

Instead of trial-and-error:

Copilot CLI analyzed the Spring Security configuration
Identified default form login and HTTP basic auth interference
Suggested disabling them explicitly
Provided the exact fix in SecurityConfig

This experience showed that Copilot CLI is effective not only for writing code, but also for systematic debugging and root-cause analysis. It significantly reduced setup friction and allowed me to focus on architecture and correctness instead of boilerplate.

Outcome:

A production-ready backend built end-to-end with GitHub Copilot CLI, demonstrating real-world usage beyond code generation.

Forem: Anupam Kushwaha

Why My Microservices Broke on OpenShift — And How a Hidden Kubernetes Quota Nearly Cost Me Days

The Setup

The Architecture

The CI/CD Pipeline

The Symptom

Bug #1: The Silent Quota Killer

What I Saw

Digging Deeper

What Actually Happened

The Fix

Fix A: Set revisionHistoryLimit in Your Deployments (Best Practice)

Fix B: Add Cleanup to Your Deploy Script (Recovery Safety Net)

Bug #2: The Gateway That Routed to Itself

The Clue

The Problem

The Irony

The Fix

The Complete Debugging Checklist

Lessons Learned

1. "It works on my machine" extends to Kubernetes

2. Kubernetes fails silently in ways you do not expect

3. Free-tier clusters have hidden constraints

4. Set revisionHistoryLimit from day one

5. CI/CD amplifies configuration bugs

Summary

How I cut AI calls by 95% without losing quality?

The Hidden Cost of Calling AI Too Early

The Problem

The Real Issue

The New Approach

System Flow

(Optional) System Screenshot

The Five-Layer Redesign

1. Activity Gate

2. Event-Driven Triggers

3. Cooldown Window

4. Per-User Daily Cap

5. Global AI Guard

Configuration

What Changed

Engineering Takeaway

Final Thought

From Scaffolding to Debugging: Spring Boot with GitHub Copilot CLI

What I Built

Demo

Screenshots

My Experience with GitHub Copilot CLI

Real Debugging with Copilot CLI

Outcome:

Fix A: Set `revisionHistoryLimit` in Your Deployments (Best Practice)

4. Set `revisionHistoryLimit` from day one