Forem: Peter Diakov

CloudFront: Where You Lose Money

Peter Diakov — Tue, 23 Dec 2025 20:02:41 +0000

CloudFront is usually added to an architecture with good intentions: performance, reliability, lower origin load.Then, months later, it shows up near the top of the AWS bill and nobody is sure why.
The uncomfortable truth is that CloudFront rarely becomes expensive because of one big mistake.It becomes expensive because it amplifies small inefficiencies at scale.
Here are the places where CloudFront quietly burns money and what to look at first.

Caching is not a checkbox

Most CloudFront cost problems start with caching that exists, but isn’t designed.
Different types of content behave very differently, yet they’re often served under the same cache policy. Immutable static assets, frequently changing HTML, and API responses all deserve different treatment.
Files with unique names (especially hashed assets) should be cached aggressively. If the filename changes on every build, the file itself is effectively immutable. Revalidating it every few minutes is pure waste.
At the same time, content like index.html does change but disabling caching entirely is rarely the right answer. A short TTL is usually enough to balance freshness and cost.
Once CloudFront caching is reasonable, browser caching becomes the next win. Proper Cache-Control headers on S3 objects reduce repeated requests and quietly cut costs without touching infrastructure.

If CloudFront isn’t the only entry point, you’re overpaying

When content is downloaded directly from S3 instead of passing through CloudFront, you don’t just lose money - you also lose security.
This usually happens when an application accidentally exposes raw S3 URLs or when old links still point to the bucket. From a user perspective everything still works, but behind the scenes S3 is serving traffic that CloudFront should be handling.
Financially, this bypass means: no caching and compression -> Higher S3 data transfer costs. But the bigger issue is security.
An S3 bucket that’s accessible to the public or reachable directly from the internet is a misconfiguration. In production, S3 should only serve content through CloudFront, enforced by Origin Access Control (OAC).
When OAC is configured, CloudFront becomes the only trusted entry point. Everything else gets blocked.
If users — or bots — can reach S3 directly, you're both overspending and exposing your storage layer to unnecessary risk.

Compression is either on - or you’re paying for air

Compression issues are easy to miss because they don’t break functionality.
Some clients and third-party tools send headers that explicitly disable compression. If your origin respects those headers, responses are delivered uncompressed, increasing payload size and data transfer cost.
CloudFront can handle compression safely but only if automatic compression is enabled. This one setting can be the difference between a reasonable bill and a confusing one.

Zombie distributions still cost real money

CloudFront distributions tend to accumulate.
Proof-of-concepts, temporary domains, legacy projects - they’re rarely deleted. Even if nobody uses them intentionally, bots often do.
A quick review of distributions and their traffic metrics often reveals “ghost” resources that should have been disabled months ago. Disable first, delete later.
Unused infrastructure that still receives traffic is pure waste.

Global CDN for a local product = wasted budget

By default, CloudFront serves content from edge locations worldwide.
If your users are primarily in one region, limiting the distribution to an appropriate price class can reduce data transfer costs without affecting real users. Many teams never revisit this setting after initial setup.
Global reach is powerful but unnecessary reach is expensive.

Security rules also show up on the bill

AWS WAF protects CloudFront, but it also evaluates rules on every request.
Over time, rule sets grow. Managed rules are enabled “just in case”, logging is turned on for everything, and requests that should be blocked early continue through the system.
Regular WAF reviews reduce unnecessary processing and lower CloudFront costs at the same time. Security and cost optimization are not opposites here.

Every extra kilobyte is multiplied by traffic

Even with perfect caching, CloudFront charges for data transferred.
This is where developers matter most. It’s worth reviewing what the client actually receives, especially on the initial page load. Many applications return configuration data, metadata, or API fields that the frontend no longer uses.
Every extra kilobyte is multiplied by traffic volume. Compression helps, but it doesn’t make unnecessary data free.
Reducing payload size improves performance, lowers CloudFront costs, and reduces origin load without touching infrastructure.

You can’t optimize what you don’t monitor

CloudFront rarely becomes expensive overnight. Costs usually drift upward quietly.
Cost anomaly detection, monthly cost reports, and distribution-level monitoring turn CloudFront from a surprise into a controlled system. Without visibility, even good architectures decay.

Final Thought

CloudFront doesn’t waste your money. It accurately bills you for inefficiency.
Every missed cache opportunity, every extra header, every forgotten distribution is multiplied by traffic. Treat CloudFront as a living system (not a one-time setup) and it will stay cheap and predictable.

CPU Limits in Kubernetes: Mostly Harmful, Occasionally Essential

Peter Diakov — Sat, 20 Dec 2025 09:01:32 +0000

CPU limits in Kubernetes are often treated as a mandatory best practice. Define requests, define limits, move on.
Over time, however, many teams discover that CPU limits, especially for application workloads, introduce more problems than they solve.

In this article, I’ll explain why CPU limits are frequently counterproductive, and then describe a real production incident where the absence of CPU limits on a specific type of workload led to a node failure. The takeaway is not a reversal of the original idea, but a clearer understanding of where it applies and where it absolutely does not.

Why CPU Limits Often Hurt More Than They Help

CPU Limits Introduce Artificial Throttling
In Kubernetes, CPU limits are enforced by the Linux scheduler using cgroups. Once a container reaches its assigned quota, it is forcibly throttled within the current scheduling period, even if the node has idle CPU capacity available.
For workloads with bursty CPU patterns, this behavior is harmful. Many services occasionally need short-lived CPU spikes to complete work efficiently: handling request bursts, warming caches, or performing runtime maintenance tasks. When limits are set, those spikes turn into throttling events, increasing request latency and amplifying tail delays.
In practice, this often means worse performance on an otherwise healthy and underutilized node.

Removing CPU Limits Can Improve Real-World Stability
In several production environments, removing CPU limits from application workloads has led to measurable improvements:

Lower latency under load
Faster recovery from traffic spikes
Reduced throttling without additional infrastructure

Autoscaling mechanisms such as HPA work best when containers can fully utilize available CPU. Artificial caps interfere with this feedback loop and delay scale-out exactly when it’s most needed.
For many application services, CPU limits end up solving a problem that doesn’t exist, while creating one that does.

When This Approach Becomes Dangerous

The guidance above assumes one critical condition:

The workload must fully respect Kubernetes resource isolation.

Not all workloads do.
We encountered this firsthand in a Kubernetes management cluster running build agents.

Incident: Node CPU Saturation and NotReady State
A worker node suddenly reached near-constant 100% CPU utilization and remained there for several minutes. Shortly after:

The kubelet stopped reporting heartbeats
The node transitioned to NotReady

At the same time:

Pod-level CPU metrics looked normal
No throttling was visible at the pod level
Nothing appeared obviously misconfigured

The build agent pod running on the node did not have CPU limits configured by design, following the “no CPU limits” philosophy.
So why did a single pod manage to destabilize the entire node?

Root Cause: Privileged Build Workloads Bypass Assumptions

The build agent was running as a privileged pod and started its own container runtime internally to execute jobs.
This distinction matters.

What Actually Happened

The pod itself was scheduled normally and respected its CPU request
Inside the pod, a container runtime launched additional processes
Those processes were not constrained by pod-level CPU isolation
Under heavy workload, they consumed all available node CPU
The kubelet was starved of CPU time
Node health checks failed, and the node became NotReady

This was not a bug in Kubernetes. It was a mismatch between assumed isolation and actual workload behavior.

Revisiting the Question: Should CPU Limits Be Used?
The correct answer is neither “always” nor “never”.
CPU Limits Are Often Unnecessary For:

Stateless application services
Non-privileged containers
Workloads without nested runtimes
Large nodes with sufficient headroom
Services managed by HPA

CPU Limits or Strong Isolation Are Required For:

Build agents and CI runners
Privileged pods
Workloads executing untrusted or user-defined code
Nested container runtimes
Small or mixed-purpose nodes

In these cases, assuming that “CPU limits are harmful” without additional isolation is a mistake.

Practical Recommendations

For build and CI workloads:

Use dedicated node groups
Apply taints and tolerations
Avoid colocating them with application workloads
Enforce resource boundaries at the node level
Prefer architectures that avoid nested runtimes

Final Thoughts

Removing CPU limits can significantly improve performance but only when workloads behave as expected and respect Kubernetes isolation boundaries.
Privileged workloads and build systems operate under different rules. Applying application-level best practices to them without adjustment can destabilize the entire cluster.

The real lesson is simple:

Optimize trusted workloads. Isolate the rest.

Claude Skills + MCP Server: Create Consistent AWS Cost Reports in Minutes

Peter Diakov — Sat, 13 Dec 2025 09:43:32 +0000

Introduction

This article can be useful for FinOps engineers who want to automate their monthly cost reporting workflow. FinOps is all about managing cloud costs efficiently, but creating consistent executive reports every month is time-consuming and error-prone. Most companies spend heavily on cloud infrastructure, and CTOs need clear, standardized reports to track spending trends. As a FinOps engineer, you need a system that generates professional cost reports with the same structure every month - not ad-hoc prompts that vary in quality. In this article, I’ll show you how to build a Claude Skill that combines MCP Server with AWS Cost Explorer to create deterministic, executive-ready cost reports in just 3 minutes. Once configured, you’ll never need to generate complex prompts again.

What is MCP Server?

MCP (Model Context Protocol) Server is a technology that allows AI assistants like Claude to connect directly to external tools and data sources. Think of it as a bridge between Claude and your cloud infrastructure.
In our case, the AWS Cost Explorer MCP Server enables Claude to make API calls to AWS Cost Explorer, fetch your actual spending data, analyze usage patterns, and retrieve detailed cost breakdowns - all in real-time. Without MCP, you’d need to manually export cost data from AWS, format it, and paste it into Claude. With MCP, Claude accesses the data directly and automatically.

What Are Claude Skills?

Think of Claude Skills as reusable expert templates. Instead of explaining your requirements every single time, you create a skill file that contains all your instructions, preferences, and guidelines. Claude then follows this “playbook” perfectly every time you invoke it.
For FinOps reporting, this means:

Same professional structure every month
Consistent analysis framework
No forgotten requirements
Executive-ready format guaranteed

Prerequisites for Claude Skill Creation

Before creating the skill, you need to set up the necessary tools and permissions:

Install Claude Desktop - Download and install from claude.ai/download
Add AWS Cost Explorer MCP Server - Configure the MCP server in your Claude Desktop settings by adding it to your configuration file
Configure AWS IAM Permissions - Create an IAM user with Cost Explorer read permissions and configure the credentials

Detailed setup instructions for the MCP server and IAM configuration can be found in the AWS Cost Explorer MCP Server documentation.

Creating an AWS Cost Report Skill

Creating a Claude Skill is straightforward. You simply ask Claude: “Help me create a skill for generating monthly AWS cost reports.”
Claude will then guide you through a series of clarifying questions to understand your specific requirements:

What data sources and tools you’re using
How you organize your infrastructure (tags, environments)
What report structure your executives expect
What level of detail and visualizations you need
What analysis and recommendations should be included

Based on your answers, Claude automatically generates a skill file containing detailed, structured instructions. This file defines exactly how Claude should fetch data, analyze it, generate visualizations, and format the final report.

Once the skill is created and saved in Claude Desktop, using it is incredibly simple. Just print this command in Claude prompt:

Use the AWS Cost CEO Report skill to compare November and December 2025 costs.

That’s it. Claude follows the instructions in the skill file and delivers a consistent, professional report every time.

Here is an example of Claude Skill reference file:

# AWS Cost CEO Report Skill - Quick Reference


## 🚀 Quick Start

AWS cost report for the CEO - September vs October 2025

## ✅ Prerequisites Checklist
- [ ] All Cost Explorer MCP servers are started
- [ ] You have cost data for the months you want to compare


## 🎯 Trigger Phrases
| What to Say | What It Does |
|-------------|--------------|
| "AWS cost report for the CEO" | Generates full report for all accounts |
| "Compare September vs October 2025" | Month-to-month comparison |
| "Production account only" | Limits to single account |
| "We implemented VPC endpoints..." | Includes savings actions analysis |


## 📊 Report Includes
✅ Executive summary with key metrics 
✅ Visual charts and graphs 
✅ Per-account breakdowns 
✅ Cost driver analysis 
✅ Optimization recommendations 
✅ PDF export capability 


## 🎨 Report Filters
- Shows only services with >$50/month spend
- Analyzes services with >10% change OR >$100 change
- Focuses on top cost drivers


## 💡 Pro Tips

1. **Monthly Cadence**: Generate reports at the end of each month
2. **Include Context**: Mention infrastructure changes or savings actions
3. **Review Trends**: Look for unexpected cost increases
4. **Act on Recommendations**: Review the optimization section
5. **Export PDF**: Use browser print function to save as PDF


## 🚨 Common Issues

| Issue | Solution |
|-------|----------|
| MCP servers not accessible | Start Docker Desktop and MCP containers |
| Invalid month format | Use "September 2025" not "9/2025" |
| No data returned | Check dates aren't in future |


## 📈 Cost Optimization Areas Analyzed
- EC2 instance rightsizing
- NAT Gateway optimization (VPC endpoints)
- Data transfer costs
- Idle resources
- Over-provisioned infrastructure
- Development environment scheduling
- Storage lifecycle policies
- Reserved Instance opportunities


## 🎓 Example Requests

**Basic Report:**
Generate AWS cost report for the CEO comparing September vs October 2025

**With Savings Context:**
AWS cost report for September vs October 2025.

Actions taken:
- Implemented VPC endpoints
- Migrated to Graviton instances
- Enabled S3 Intelligent Tiering

## 📦 Skill Components
- Main workflow instructions (SKILL.md)
- MCP usage patterns (references/mcp-usage.md)
- Report structure guide (references/report-structure.md)
- HTML template reference (scripts/generate_report.py)


## 🔍 Analysis Depth
**Level 1**: Service-level cost comparison 
**Level 2**: Usage-type breakdown (instance types, hours) 
**Level 3**: Root cause identification 
**Level 4**: Contextual analysis with recommendations 


## 💰 Cost Leak Detection
The skill automatically flags:
- Unexpected spikes >20%
- High data transfer costs
- NAT Gateway optimization opportunities
- Idle resources
- Over-provisioned instances
- 24/7 dev/test environments
- Legacy unused resources

## 📄 Export to PDF
1. Open artifact in new window
2. Press Ctrl+P (or Cmd+P on Mac)
3. Select "Save as PDF"
4. Adjust settings if needed
5. Save and share!

Benefits for FinOps Teams:

Reproducibility: Same report quality every month
Scalability: Easy to generate reports for multiple accounts/regions
Compliance: Standardized format satisfies audit requirements
Onboarding: New team members can generate reports without training
Evolution: Update the skill once to improve all future reports

Conclusion

MCP Server combined with Claude Skills transforms FinOps reporting from a day-long manual process into a 3-minute automated workflow. You get the speed of AI-powered analysis with the consistency and professionalism that executive leadership demands.

Scalable Multi-Tenant Architecture for Hundreds of Custom Domains

Peter Diakov — Sat, 22 Nov 2025 15:14:19 +0000

Introduction

Modern SaaS commerce platforms often face a similar challenge: supporting a large number of customer-specific storefronts, each with its own custom domain, while still relying on a shared backend. When your infrastructure is built on EKS with CloudFront and an Application Load Balancer in front, and each tenant requires its own SSL certificate, scaling becomes a real architectural puzzle.

This article describes the problem we encountered, the options we evaluated, and the architecture we ultimately implemented to handle hundreds of HTTPS-enabled custom domains cleanly and reliably.

Context: The Multi-Tenant Storefront Platform

Our platform allows customers to create online shops under their own domains. Dozens or even hundreds of domains like: storeABC.com, brandshop.net, or my-boutique.co.uk - all point to a shared CloudFront distribution and eventually arrive at the same ALB and EKS cluster. The backend determines the tenant based on the incoming Host header.

For this to work securely, each domain needs its own HTTPS certificate. CloudFront now supports a Multi-Tenant SaaS distribution model, which makes onboarding new domains dramatically easier: certificates are created in us-east-1, automatically validated, and attached to a single shared CloudFront distribution that serves all tenants. Also AWS ssl certificate limit is 2000 per one Multi-Tenant SaaS distribution and we can have several such distributions.

This simplicity stops at CloudFront. The real challenge appears on the origin side.

The Certificate Problem on the ALB Side

CloudFront terminates HTTPS at the edge. After that, it connects to the origin - in our case, an Application Load Balancer - using HTTPS again, so ALB also needs certificates for tls handshake.

In AWS ALB has a strict quota:
→ 100 SSL certificates per ALB

In a system with 300+ custom domains, this limit becomes a blocker. This brings us to the architectural decision point.

Architecture Options We Considered

1. Using CloudFront VPC Origin to Avoid SSL on the ALB

One of the first ideas was to place CloudFront’s origin inside the VPC using the VPC Origin feature. Traffic would travel over AWS’s private network and could therefore be sent to the ALB over plain HTTP, eliminating the need for certificates on the ALB entirely. It’s elegant—until you discover that VPC Origin does not support WebSockets, a requirement for our application. That limitation alone ruled out this approach.

2. Splitting Tenants Across Multiple ALBs and Distributions

Another possibility involved spliting tenants across Several CloudFront SaaS distributions and Several ALBs.This bypasses the 100-cert limit by distributing certificates across multiple ALBs and CloudFront distributions.

Why we rejected it:

Massive IaC overhead
Requires tracking which tenant belongs to which ALB & distribution
Complicated domain lifecycle management
Harder observability & error tracing
High risk of misconfiguration
Poor maintainability at scale

Technically feasible - but operationally a nightmare.

The Solution: One ALB, One Distribution, One Internal Certificate

What ultimately made this architecture scalable was realizing that CloudFront does not require the same domain that the customer is visiting to be used as the origin domain. CloudFront only expects that the origin domain configured in the distribution has a valid certificate on the ALB—nothing more. At the same time, CloudFront is perfectly capable of forwarding the original Host header to the backend.

This allowed us to decouple tenant domains from the ALB entirely.

How It Works

We introduced a single internal domain, for example:
origin.example.com

This domain points to the ALB’s DNS name via a CNAME record. The ALB holds just one TLS certificate for this internal domain, and CloudFront uses this domain as its sole origin for all tenants.

User → https://storeABC.com
 ↓
CloudFront (receives request)
- TLS handshake with storeABC.com
- Decrypts request
- Forwards request to origin
 ↓
CloudFront → Origin (ALB via CNAME):
- New HTTPS request to https://origin.example.com
- TLS handshake with ALB (*.example.com)
- Sends HTTP request with Host: storeABC.com
 ↓
ALB:
- Terminates TLS
- Receives Host: storeABC.com
- Forwards to backend service/pod in EKS

With this setup, the number of tenant domains no longer affects the ALB at all. Scaling from 10 to 1,000 custom domains is simply a matter of attaching more certificates to CloudFront, which is designed for that scale.

Why This Architecture Works Well

This solution eliminates the ALB certificate quota entirely while preserving strong end-to-end encryption. It keeps the infrastructure clean: only one CloudFront distribution, one ALB, one certificate on the origin, and no complicated domain-to-origin mapping. WebSockets work because the origin is a standard HTTPS endpoint rather than a VPC-origin endpoint. And from an IaC perspective, the system becomes straightforward to automate and reason about.

Most importantly, the design uses only native AWS capabilities with no custom components or service-specific tricks, making it robust, predictable, and cloud-portable if necessary.

Conclusion

Designing a multi-tenant architecture capable of handling hundreds of custom domains is not as simple as placing CloudFront in front of an ALB. Certificate limits, WebSocket support, and management complexity all influence the final design. After evaluating several approaches, we found that introducing a single internal origin domain offers the cleanest and most scalable solution. CloudFront takes responsibility for per-tenant TLS termination, while the ALB handles only a single internal certificate. The backend receives correct tenant routing through the Host header, and the system remains simple enough to maintain and automate.