Forem: Thej Deep

Strangler Fig on IBM Kubernetes: Modernizing a Monolith Without Breaking Production

Thej Deep — Wed, 04 Feb 2026 00:18:47 +0000

Why the Strangler Fig Pattern Still Works

Most enterprise monoliths don’t fail because of bad code.

They fail because changing them safely becomes too risky.

A full rewrite to microservices sounds attractive, but in practice it often leads to:

Long delivery cycles
High data risk
Business disruption

The Strangler Fig pattern offers a safer alternative:

modernize incrementally while keeping the system running.

In this article, I walk through a step-by-step, production-safe approach to applying the Strangler Fig pattern using IBM Cloud Kubernetes Service (IKS), including real commands and manifests you can run.

What You Will Build

By the end of this guide, you will:

Containerize an existing monolithic application
Deploy it to IBM Cloud Kubernetes Service
Place it behind Ingress
Deploy a new “edge” service
Route traffic gradually using path-based routing
Keep rollback simple and safe

Prerequisites

IBM Cloud account
An existing IKS cluster
Tools installed locally:
- ibmcloud
- kubectl
- docker

Step 1: Log in to IBM Cloud and Connect to Kubernetes

ibmcloud login -a https://cloud.ibm.com
ibmcloud target -r <REGION> -g <RESOURCE_GROUP>

Step 2: Create a Dedicated Namespace

Keep modernization isolated and easy to clean up later.

kubectl create namespace monolith-demo
kubectl config set-context --current --namespace=monolith-demo
kubectl get ns

Step 3: Containerize the Existing Monolith

The goal here is no behavior change, just package the monolith.

Example Dockerfile (Node.js monolith)

FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .

FROM node:20-alpine
WORKDIR /app
COPY --from=build /app /app
EXPOSE 8080
CMD ["npm","start"]

Add minimal health endpoints (if you don’t already have them)

// Example endpoints
app.get("/health", (req, res) => res.status(200).send("ok"));
app.get("/ready", (req, res) => res.status(200).send("ready"));

Build the image:

docker build -t monolith:1.0.0 .

Step 4: Push the Image to IBM Cloud Container Registry

ibmcloud cr login
ibmcloud cr namespace-add <REGISTRY_NAMESPACE>

Tag and push your image:

docker tag monolith:1.0.0 <REGISTRY>/<REGISTRY_NAMESPACE>/monolith:1.0.0
docker push <REGISTRY>/<REGISTRY_NAMESPACE>/monolith:1.0.0

Verify it exists:

ibmcloud cr images | grep monolith

Step 5: Deploy the Monolith to Kubernetes

5.1 Deployment manifest (deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monolith
spec:
  replicas: 2
  selector:
    matchLabels:
      app: monolith
  template:
    metadata:
      labels:
        app: monolith
    spec:
      containers:
        - name: monolith
          image: <REGISTRY>/<REGISTRY_NAMESPACE>/monolith:1.0.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10

Apply and confirm rollout:

kubectl apply -f deployment.yaml
kubectl rollout status deploy/monolith
kubectl get pods -l app=monolith

5.2 Service manifest (service.yaml)

apiVersion: v1
kind: Service
metadata:
  name: monolith-svc
spec:
  selector:
    app: monolith
  ports:
    - name: http
      port: 80
      targetPort: 8080
  type: ClusterIP

Apply:

kubectl apply -f service.yaml
kubectl get svc monolith-svc

Quick local test:

kubectl port-forward svc/monolith-svc 8080:80
curl -i http://localhost:8080/health

Step 6: Put the Monolith Behind Ingress

Ingress becomes your routing control plane for strangling.

Create ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
    - host: <APP_DOMAIN>
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: monolith-svc
                port:
                  number: 80

Apply:

kubectl apply -f ingress.yaml
kubectl get ingress app-ingress -o wide

At this point: 100% traffic still goes to the monolith.

Step 7: Pick the First “Edge” Capability to Extract

Start with something:

low risk
clear boundaries
minimal writes

Good first choices:

/api/auth/*
/api/reporting/*
read-only catalog endpoints

For this walkthrough, we’ll extract:

/api/auth/*

Step 8: Build the New Edge Service (auth-service)

Minimal example endpoint:

app.get("/health", (req, res) => res.status(200).send("ok"));
app.get("/ready", (req, res) => res.status(200).send("ready"));


app.get("/api/auth/ping", (req, res) => {
  res.json({ service: "auth-service", status: "pong" });
});

Dockerfile for the new service:

FROM node:20-alpine
WORKDIR /app
COPY . .
EXPOSE 8081
CMD ["node","server.js"]

Build and push:

docker build -t auth-service:1.0.0 .
docker tag auth-service:1.0.0 <REGISTRY>/<REGISTRY_NAMESPACE>/auth-service:1.0.0
docker push <REGISTRY>/<REGISTRY_NAMESPACE>/auth-service:1.0.0

Step 9: Deploy the New Service to Kubernetes

9.1 Deployment (auth-deploy.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: auth-service
  template:
    metadata:
      labels:
        app: auth-service
    spec:
      containers:
        - name: auth-service
          image: <REGISTRY>/<REGISTRY_NAMESPACE>/auth-service:1.0.0
          ports:
            - containerPort: 8081
          readinessProbe:
            httpGet:
              path: /ready
              port: 8081
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 10

Apply:

kubectl apply -f auth-deploy.yaml
kubectl rollout status deploy/auth-service
kubectl get pods -l app=auth-service

9.2 Service (auth-svc.yaml)

apiVersion: v1
kind: Service
metadata:
  name: auth-svc
spec:
  selector:
    app: auth-service
  ports:
    - name: http
      port: 80
      targetPort: 8081
  type: ClusterIP

Apply:
kubectl apply -f auth-svc.yaml
kubectl get svc auth-svc

Step 10: Strangle Traffic Using Ingress Routing

Update ingress.yaml so /api/auth/* routes to the new service, and everything else stays on the monolith:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  rules:
    - host: <APP_DOMAIN>
      http:
        paths:
          - path: /api/auth
            pathType: Prefix
            backend:
              service:
                name: auth-svc
                port:
                  number: 80
          - path: /
            pathType: Prefix
            backend:
              service:
                name: monolith-svc
                port:
                  number: 80

Apply:

kubectl apply -f ingress.yaml
kubectl get ingress app-ingress -o wide

Test:

curl http://<APP_DOMAIN>/api/auth/ping

Expected:

{"service":"auth-service","status":"pong"}

Step 11: Rollback Strategy

Keep rollback boring and fast.

Option A: Route back to monolith

Edit Ingress and remove the /api/auth path (or point it to monolith-svc), then re-apply:

kubectl apply -f ingress.yaml

Option B: Undo the deployment rollout

kubectl rollout undo deploy/auth-service

Step 12: Repeat the Pattern Safely

Once the first extracted capability is stable:

Choose the next bounded domain

Build it as a separate service

Deploy it

Route it with Ingress

Keep rollback available at every step

Over time:

the monolith shrinks
risk decreases

modernization becomes routine rather than a “big migration”

What You Achieved

No downtime
No big rewrite
Production-safe modernization
Kubernetes as an enabler, not a forcing function

Final Thoughts

The Strangler Fig pattern works because it respects reality.

You don’t modernize by deleting the past.
You modernize by outgrowing it safely.

If you’re sitting on a monolith today, this approach lets you move forward without breaking what already works.

Beyond Backups: Building Verifiable Cloud Recovery on IBM Cloud

Thej Deep — Wed, 28 Jan 2026 03:48:44 +0000

Why Backups Are No Longer Enough

Most cloud recovery strategies assume something dangerous:

that logs, metadata, and audit trails remain trustworthy after an attack.

In real ransomware and APT incidents, attackers don’t stop at encrypting data.

They erase timelines, rewrite access trails, and poison audit logs.

Recovery still happens, but without certainty.

This article explores how IBM Cloud can be used to design verifiable recovery architectures, where restoration is based on cryptographic proof rather than trust.

The Core Problem with Traditional Cloud Recovery

Most environments rely on:

Snapshot-based backups
Centralized audit logs
Time-based restore points

These mechanisms fail under advanced attacks because:

Logs live in the same trust boundary as workloads
Metadata is flat and mutable
Recovery tools assume audit trails are truthful

Once attackers gain lateral movement, forensics becomes speculation.

What’s missing is an independent validation plane.

From Trusted Logs to Verifiable Evidence

Instead of asking “Which backup should we restore?”

we should ask:

Can we prove this data was not altered?

That shift requires three principles:

Immutability at rest
Independent verification of metadata
Cryptographic validation before recovery

IBM Cloud already provides the primitives to build this.

Reference Architecture: Verifiable Recovery on IBM Cloud

1. Workload & Event Capture Layer

Applications run on IBM Cloud VPC or IBM Cloud Kubernetes Service.

Every critical operation emits a provenance event:

Object hash
Identity context
Timestamp window
Resource lineage

These events are streamed using IBM Event Streams (Kafka), ensuring ordering and durability.

2. Immutable Storage Layer

All data is written to IBM Cloud with:

Object Lock (WORM)
Retention policies
Cross-region replication

Even administrators cannot mutate stored objects.

This ensures data immutability, but immutability alone is not verification.

3. Independent Verification Plane

Provenance hashes are committed to encryption.

Smart contracts validate:

Hash consistency
Write ordering
Metadata integrity

This ledger exists outside the application trust boundary.

If attackers alter logs or metadata, verification fails.

4. Key Isolation & Zero-Trust Controls

Encryption keys are managed using IBM Key Protect.

Key release is conditional:

Provenance verification must succeed
IAM context must match expected behavior
Blockchain state must confirm integrity

No verified state → no decryption → no recovery.

5. Forensic Intelligence & Recovery Decisions

Instead of restoring blindly, the process analyzes:

Provenance graph anomalies
Lateral movement indicators
Suspicious metadata rewrites

Recovery teams receive:

Confidence scores for restore points
Attack timeline reconstruction
Evidence-backed recovery recommendations

What Changes Operationally

Traditional Recovery	Verifiable Recovery
Restore snapshots	Validate integrity first
Trust audit logs	Prove audit trails
Recover quickly	Recover correctly
Assume compliance	Produce evidence

Backups still matter.

But proof matters more.

Why This Matters for IBM Cloud Practitioners

This architecture demonstrates:

Zero-trust recovery design
Blockchain as an infrastructure primitive
AI-assisted forensic validation
Compliance through evidence, not policy

AWS Security Issues You Can Actually Fix With Settings

Thej Deep — Mon, 19 Jan 2026 04:52:15 +0000

AWS Security Issues You Can Actually Fix With Settings

Most AWS security incidents are not caused by zero-day exploits or nation-state attacks.

They happen because a default was left unchanged, a policy grew over time, or an automation path quietly gained more privilege than intended.

The good news: many of today’s highest-impact AWS security risks can be fixed immediately using settings you already have access to.

Below are issues I keep seeing in real environments, along with how teams are fixing them without buying new tools.

1. IAM Roles Quietly Becoming “Permanent Credentials”

The issue

IAM roles were designed to be short-lived and tightly scoped. In practice, many production roles now behave like permanent credentials:

Broad wildcard permissions (*:*)
Long session durations
Reuse across environments
Assumed by CI/CD, humans, and automation

This creates a single compromise point with no expiration pressure.

Why this still happens

Because nothing breaks immediately.

The blast radius only shows up during an incident.

Fix (pure settings)

Reduce MaxSessionDuration on sensitive roles (1–2 hours, not 12)
Split roles by actor (human, pipeline, service)
Add IAM policy Condition blocks:
- aws:SourceVpc
- aws:PrincipalArn
- aws:RequestedRegion
Enable IAM Access Analyzer findings and treat them as defects, not suggestions

Why this matters: This is one of the highest-ROI security fixes in AWS, and it costs nothing.

2. CloudTrail Logging Exists… But it Isn’t Defensible

The issue

Most accounts have CloudTrail enabled.

Very few have it configured to survive an attacker who already gained access.

Common gaps:

Logs stored in the same account being attacked
No log file integrity validation
No alerts on trail changes

Fix (pure settings)

Send CloudTrail logs to a central logging account
Enable log file integrity validation
Add EventBridge rules for:
- StopLogging
- DeleteTrail
- UpdateTrail
Block CloudTrail modification using SCPs in production OUs

Logging that can be deleted is not logging. It’s optimism.

3. Security Groups Used as “Temporary Fixes” Permanently

The issue

Security groups are often modified during incidents or debugging.

“Just open it to my IP for now.”

Those rules almost never get removed.

Over time:

CIDRs widen
Ports accumulate
Original intent is lost

Fix (pure settings + discipline)

Enable AWS Config rules to:
- Disallow 0.0.0.0/0 on sensitive ports
- Flag unused security group rules
Use security group referencing instead of IPs wherever possible
Require PRs or change tickets for security group updates via IaC

Security groups are infrastructure. Treat them like code.

4. S3 Buckets Are “Private” Until They Aren’t

The issue

Most S3 breaches involve not public buckets.

They are implicitly accessible buckets caused by:

Over-permissive bucket policies
Forgotten cross-account access
IAM roles with s3:* on *

Fix (pure settings)

Enable S3 Block Public Access at the account level
Turn on Access Analyzer for S3
Require bucket policies to explicitly deny:
- Unencrypted uploads
- Non-TLS access
Enforce S3 server-side encryption

If a bucket contains data you wouldn’t post publicly, its policy should read like a contract, not a guess.

5. CI/CD Pipelines With More Power Than Production

The issue

Build systems often have broader permissions than runtime services:

Can create IAM roles
Can modify networking
Can read secrets across environments

If CI/CD is compromised, production is already lost.

Fix (pure settings)

Scope pipeline roles to deployment-only permissions
Separate build and deploy roles
Enforce iam:PassRole restrictions
Rotate pipeline credentials aggressively
Use AWS-managed OIDC instead of static secrets

Your pipeline is part of your attack surface. Configure it like one.

6. GuardDuty Enabled But Ignored

The issue

GuardDuty is enabled, and findings exist, but:

Nobody owns them
No severity mapping
No automated response

This creates alert fatigue with zero protection.

Fix (pure settings)

Route high-severity findings to EventBridge
Auto-tag compromised resources
Trigger isolation playbooks (security groups, IAM disable)
Define escalation paths per severity

Detection without response is noise.

Final Thought

Most AWS security problems in 2026 are not caused by missing tools.

They are caused by unconfigured intent.

The platform already provides controls like IAM conditions, SCPs, Config rules, centralized logging, and event-driven responses. The gap is not in capability. It’s attention.

Security improves fastest when teams stop asking:

“What should we buy?”

and start asking:

“Which defaults should never have been trusted?”

AWS Is Moving Toward AI Factories, Not One-Off AI Projects

Thej Deep — Tue, 13 Jan 2026 04:54:02 +0000

AWS Is Moving Toward AI Factories, Not One-Off AI Projects

Most teams began their AI journey by running models in the cloud.

That approach worked for experimentation, but it breaks down quickly in production where reliability, cost control, governance, and continuous improvement matter far more than model accuracy alone.

What AWS is enabling now represents a fundamental shift.

This is no longer about deploying isolated models or calling an API. It is about building repeatable systems that continuously produce intelligence.

What Is an AI Factory?

An AI Factory is not a single service or tool.

It is a platform capability that continuously:

Ingests and governs data
Trains or fine-tunes models
Runs inference reliably at scale
Observes quality, performance, and cost
Feeds those signals back into the system

Just as CI/CD standardized software delivery, AI Factories bring structure, repeatability, and operational discipline to AI.

AI becomes part of the platform—not a side project.

A Simple AWS Reference Architecture

[Applications & APIs]
        |
        v
[API Gateway / Service Mesh]
        |
        v
[Amazon Bedrock]
  - Foundation models
  - Fine-tuning
  - Safety guardrails
        |
        v
[Compute Layer]
  - AWS Trainium
  - AWS Graviton
        |
        v
[Data Layer]
  - Amazon S3
  - Lake Formation
        |
        v
[Observability & Governance]
  - CloudWatch
  - OpenTelemetry
  - IAM & cost controls

This architecture illustrates a critical shift:

AI is embedded into the platform lifecycle, not deployed as an isolated workload.

Why This Matters in Practice

Traditional AI platforms often fail in production because:

Pipelines are fragile
Costs are unpredictable
Governance is added too late
Scaling requires redesign

AI Factories address these issues by being:

Cloud-native and event-driven
Observable by default
Secure and governed from day one
Scalable without re-architecture

This dramatically reduces friction when moving from proof-of-concept to production.

Key AWS Building Blocks That Enable AI Factories

Managed access to foundation models with built-in data isolation, governance, and guardrails.
Designed for AI economics, critical when inference and retraining run continuously.
Event-driven pipelines. Systems respond to new data, model drift, or demand signals, rather than static schedules.
Built-in observability using Model behavior, latency, and cost become measurable and actionable.
Security and compliance are enforced as part of the platform, not bolted on later.

Why Architects Should Pay Attention

This shift is not about choosing a better model.

It is about designing platforms where AI can evolve safely over time.

Teams that adopt an AI Factory mindset can:

Treat models like deployable artifacts
Apply policy and automation consistently
Control cost, risk, and blast radius as systems grow

This is the difference between running AI and operating AI at scale.

Final Thought

The cloud is no longer just hosting AI workloads.

It is becoming the place where intelligence is built, refined, and delivered continuously.

AWS’s move toward AI Factories is a strong signal of where production-grade AI architecture is heading next.