<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: abidaslam892</title>
    <description>The latest articles on Forem by abidaslam892 (@abidaslam892).</description>
    <link>https://forem.com/abidaslam892</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3579881%2Ffa5cf121-fd9d-422e-a2a6-d52aef77a8b2.png</url>
      <title>Forem: abidaslam892</title>
      <link>https://forem.com/abidaslam892</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/abidaslam892"/>
    <language>en</language>
    <item>
      <title>Building a Production-Multi-Cloud DevOps Platform: A Complete Journey from Zero to Hero</title>
      <dc:creator>abidaslam892</dc:creator>
      <pubDate>Sun, 23 Nov 2025 09:58:32 +0000</pubDate>
      <link>https://forem.com/abidaslam892/building-a-production-multi-cloud-devops-platform-a-complete-journey-from-zero-to-hero-29g0</link>
      <guid>https://forem.com/abidaslam892/building-a-production-multi-cloud-devops-platform-a-complete-journey-from-zero-to-hero-29g0</guid>
      <description>&lt;p&gt;&lt;strong&gt;Building a Production-Multi-Cloud DevOps Platform: A Complete Journey from Zero to Hero&lt;/strong&gt;&lt;br&gt;
Abidaslam&lt;/p&gt;
&lt;h2&gt;
  
  
  Note : Visit the
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://medium.com/design-bootcamp/building-a-production-multi-cloud-devops-platform-a-complete-journey-from-zero-to-hero-ef292ff0f0c6" rel="noopener noreferrer"&gt;https://medium.com/design-bootcamp/building-a-production-multi-cloud-devops-platform-a-complete-journey-from-zero-to-hero-ef292ff0f0c6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I Built and Deployed a FastAPI Application Across AWS EKS and Azure AKS with Full CI/CD, Security Scanning, and Observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A comprehensive guide to building enterprise-grade cloud infrastructure with security-first principles&lt;br&gt;
I built a complete multi-cloud DevOps platform that deploys a Python FastAPI application to both AWS EKS and Azure AKS with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure as Code (Terraform) for AWS and Azure&lt;br&gt;
CI/CD Pipelines (GitHub Actions) with automated testing and security scanning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Container Security with Trivy and Checkov&lt;br&gt;
Full Observability with Prometheus, Grafana, and Loki&lt;br&gt;
Cost Optimization achieving 96% cost reduction ($141/month → $5/month)&lt;br&gt;
Production-ready Kubernetes deployments with Helm&lt;br&gt;
Project Repository: &lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops" rel="noopener noreferrer"&gt;github.com/abidaslam892/multi-cloud-devsecops&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table of Contents&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The Challenge&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Architecture Overview&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tech Stack&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implementation Journey&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Infrastructure as Code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI/CD Pipeline&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security Implementation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring &amp;amp; Observability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost Optimization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Results &amp;amp; Metrics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lessons Learned&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What’s Next&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Challenge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As a DevOps engineer, I wanted to build a project that demonstrates real-world enterprise practices. The goal wasn’t just to deploy an application to the cloud, but to create a production-grade platform that showcases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multi-cloud expertise (AWS + Azure)&lt;/li&gt;
&lt;li&gt;Infrastructure automation&lt;/li&gt;
&lt;li&gt;Security-first approach&lt;/li&gt;
&lt;li&gt;Cost-conscious architecture&lt;/li&gt;
&lt;li&gt;Observability and monitoring&lt;/li&gt;
&lt;li&gt;GitOps principles&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most tutorials show you how to deploy to ONE cloud. But what about multi-cloud? What about security scanning? What about cost optimization? This project answers all those questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Overview&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Infrastructure Components&lt;br&gt;
AWS Environment&lt;br&gt;
EKS Cluster (Kubernetes 1.28)&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;2x t3. medium SPOT instances (cost-optimized nodes)&lt;/li&gt;
&lt;li&gt;VPC with public/private subnets across 3 AZs&lt;/li&gt;
&lt;li&gt;NAT Gateway for private subnet internet access&lt;/li&gt;
&lt;li&gt;ECR for container registry&lt;/li&gt;
&lt;li&gt;Application Load Balancer for ingress&lt;/li&gt;
&lt;li&gt;Azure Environment:&lt;/li&gt;
&lt;li&gt;AKS Cluster (Kubernetes 1.31)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;1x Standard_D2s_v3 VM (auto-scaling enabled)&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;VNet with subnet configuration&lt;br&gt;
ACR for container registry&lt;br&gt;
Azure Load Balancer for service exposure&lt;br&gt;
Network Security Groups for traffic control&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;Core Technologies&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;
&lt;h2&gt;
  
  
  Why These Choices?
&lt;/h2&gt;

&lt;p&gt;FastAPI : Modern, fast, and async-capable Python framework with automatic API documentation.&lt;br&gt;
Terraform: Cloud-agnostic IaC tool allowing consistent infrastructure patterns across AWS and Azure.&lt;br&gt;
Helm: Templating and versioning for Kubernetes deployments, enabling environment-specific configurations.&lt;br&gt;
GitHub Actions: Native to GitHub, no additional CI/CD tools needed, excellent integration with cloud providers.&lt;br&gt;
Spot Instances: 70% cost savings on AWS compute while maintaining high availability with multiple AZs.&lt;br&gt;
Implementation Journey&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Local Development&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Started with a simple FastAPI application:&lt;br&gt;
from fastapi import FastAPI&lt;br&gt;
from pydantic import BaseModel&lt;br&gt;
app = FastAPI(title=”multi-cloud-devsecops-sample”)&lt;br&gt;
class Item(BaseModel):&lt;br&gt;
id: int&lt;br&gt;
name: str&lt;br&gt;
@app.get(“/”, tags=[“root”])&lt;br&gt;
async def read_root():&lt;br&gt;
return {“status”: “ok”, “message”: “Hello from Multi-Cloud DevSecOps sample”}&lt;br&gt;
@app.get(“/health”, tags=[“health”])&lt;br&gt;
async def health_check():&lt;br&gt;
return {“status”: “healthy”}&lt;br&gt;
@app.get(“/metrics”, tags=[“metrics”])&lt;br&gt;
async def metrics():&lt;br&gt;
return {“requests_total”: 0, “errors_total”: 0&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features Implemented&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Health check endpoint for Kubernetes probes&lt;br&gt;
Metrics endpoint for Prometheus&lt;br&gt;
RESTful CRUD operations&lt;br&gt;
Input validation with Pydantic&lt;br&gt;
Comprehensive unit tests with pytest&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Containerization&lt;/strong&gt;&lt;br&gt;
Created a multi-stage Dockerfile for optimized builds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Builder stage&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; — no-cache-dir — user &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Runtime stage&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Security: Non-root user&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;groupadd &lt;span class="nt"&gt;-r&lt;/span&gt; appuser &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; useradd &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; appuser appuser

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; appuser&lt;/span&gt;

&lt;span class="c"&gt;# Copy dependencies from builder&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; — from=builder — chown=appuser:appuser /root/.local /home/appuser/.local&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; — chown=appuser:appuser ./src ./src&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH=/home/appuser/.local/bin:$PATH&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [“uvicorn”, “src.main:app”, “ — host”, “0.0.0.0”, “ — port”, “8080”]&lt;/span&gt;

Press enter or click to view image in full size

**Security Highlights**

1. Multi-stage build reduces image size by 60%
2. Non-root user (UID 1000)
3. Minimal base image (python:3.11-slim)
4. No unnecessary packages
5. Specific version pinning
6. Result: Image size reduced from 1.2GB to ~200MB



**Phase 3: Infrastructure as Code**


Built complete Terraform modules for both clouds:

AWS Infrastructure (`terraform/aws/main.tf`):

Press enter or click to view image in full size

Press enter or click to view image in full size

Details of all the scripts &amp;amp; configuration: Can refer the GitHub

Remote state management (S3 for AWS, Blob for Azure)
Modular design for reusability
Environment-specific variables
Consistent tagging strategy
Security groups/NSGs with least privilege
Phase 4: CI/CD Pipeline
Built three GitHub Actions workflows:

CI Pipeline (`.github/workflows/ci.yaml`):
Press enter or click to view image in full size

CD Pipeline — AWS (`.github/workflows/cd-aws.yaml`):
Press enter or click to view image in full size

**Pipeline Features**

Automated testing on every commit
Security scanning before deployment
Separate workflows for AWS and Azure
Manual deployment approval capability
Rollback support via Helm
Phase 5: Kubernetes Deployment
Created Helm charts for flexible deployments:

`Helm Chart Structure
helm/chart/

├── Chart.yaml

├── templates/

│ ├── deployment.yaml

│ ├── service.yaml

│ ├── servicemonitor.yaml

│ └── ingress.yaml (optional)

└── values.yaml`


**Phase 6: Monitoring &amp;amp; Observability**

Deployed the full observability stack using Helm:

Prometheus/Grafana Installation
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

**helm repo update**

# Install kube-prometheus-stack

helm install prometheus prometheus-community/kube-prometheus-stack \

-f monitoring/prometheus-values.yaml \

— namespace monitoring — create-namespace

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Grafana Dashboard — Custom dashboard tracking:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request rate and latency&lt;/li&gt;
&lt;li&gt;Error rates (4xx, 5xx)&lt;/li&gt;
&lt;li&gt;Pod CPU and memory usage&lt;/li&gt;
&lt;li&gt;Kubernetes health metrics&lt;/li&gt;
&lt;li&gt;Container restart count&lt;/li&gt;
&lt;li&gt;Security Implementation&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-Layer Security Approach&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Container Security&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Infrastructure Security&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pod Security Context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Network Security&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AWS Security Groups with minimal ingress rules&lt;br&gt;
Azure Network Security Groups&lt;br&gt;
Private subnets for worker nodes&lt;br&gt;
NAT Gateway for controlled egress&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Secrets Management
GitHub Secrets for credentials
Kubernetes Service Accounts with RBAC
ACR/ECR authentication via managed identities
No hardcoded secrets in code
Metrics Collection
Prometheus Targets
Kubernetes API server
Kubelet metrics
Node exporter (system metrics)
Kube-state-metrics (K8s object states)
Application /metrics endpoint
Press enter or click to view image in full size&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Grafana Dashboards&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Application Dashboard
Request rate (requests/sec)
Average latency (ms)
Error rate percentage
Top endpoints by traffic
Response time distribution (P50, P95, P99)&lt;/li&gt;
&lt;li&gt;Infrastructure Dashboard
Cluster resource utilization
Node CPU/Memory/Disk usage
Pod distribution across nodes
Network I/O
Persistent volume usage&lt;/li&gt;
&lt;li&gt;Kubernetes Dashboard
Pod status overview
Deployment health
Container restart trends
Resource quota usage
Namespace metrics
Monitoring Access:
Azure Grafana: xxxxxx
Credentials: xxxx
Retention: 7 days of metrics
Press enter or click to view image in full size&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Optimization&lt;/strong&gt;&lt;br&gt;
The Cost Challenge&lt;/p&gt;

&lt;p&gt;Initial deployment costs were running at $253/month :&lt;br&gt;
AWS: $136.45/month&lt;br&gt;
Azure: $97/month&lt;br&gt;
S3/Blob state: $0.04/month&lt;br&gt;
This was too high for a learning project. Here’s how I optimized:&lt;/p&gt;

&lt;p&gt;Cost Reduction Strategies&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spot Instances (AWS)
eks_managed_node_groups = {&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;main = {&lt;/p&gt;

&lt;p&gt;capacity_type = “SPOT” # 70% savings vs On-Demand&lt;/p&gt;

&lt;p&gt;instance_types = [“t3.medium”]&lt;/p&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;Savings: $21/month (from $51 to $30)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Single NAT Gateway
enable_nat_gateway = true&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;single_nat_gateway = true # Instead of one per AZ&lt;/p&gt;

&lt;p&gt;Savings: $64/month (from $96 to $32)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Right-Sized VMs
AWS: t3.medium (2 vCPU, 4GB RAM) — adequate for dev
Azure: Standard_D2s_v3 (2 vCPU, 8GB RAM)&lt;/li&gt;
&lt;li&gt;Auto-Scaling
yaml&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;autoscaling:&lt;/p&gt;

&lt;p&gt;minReplicas: 1 # Scale down to 1 during low traffic&lt;/p&gt;

&lt;p&gt;maxReplicas: 4&lt;/p&gt;

&lt;p&gt;targetCPUUtilizationPercentage: 80&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Destroy When Not in Use
Stop everything at end of day
./scripts/destroy-aws-infrastructure.sh&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;./scripts/destroy-azure-infrastructure.sh&lt;/p&gt;

&lt;p&gt;Recreate next morning (30 minutes)&lt;br&gt;
./scripts/deploy-aws-infrastructure.sh&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Cost Breakdown&lt;/strong&gt;&lt;br&gt;
Current State (Infrastructure destroyed, state only):&lt;/p&gt;

&lt;p&gt;AWS: $0.02/month (S3 state storage)&lt;br&gt;
Azure: $5.02/month (ACR Basic + Blob state)&lt;br&gt;
Total: $5.04/month (96% reduction!)&lt;br&gt;
Active Development (when needed):&lt;br&gt;
AWS (8 hours/day): ~$1.50/day = $45/month&lt;br&gt;
Azure (24/7 minimal): $5.02/month&lt;br&gt;
Total: ~$50/month for active development&lt;br&gt;
Cost Comparison&lt;br&gt;
| Scenario | Monthly Cost | Best For |&lt;/p&gt;

&lt;p&gt;| 24/7 Production | $253 | Always-on production |&lt;/p&gt;

&lt;p&gt;| 8hr/day Dev | $50 | Active development |&lt;/p&gt;

&lt;p&gt;| Weekly Demos | $5–10 | Portfolio/interviews |&lt;/p&gt;

&lt;p&gt;| Destroyed (Current) | $5 | Learning/Idle |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ROI on Cost Optimization&lt;/strong&gt;&lt;br&gt;
Annual Savings: $2,976/year (24/7) vs $60/year (destroyed)&lt;br&gt;
Time to Recreate**: 30 minutes&lt;br&gt;
Infrastructure is Code: Can rebuild anytime&lt;br&gt;
Key Lesson: Don’t pay for idle infrastructure!&lt;br&gt;
Results &amp;amp; Metrics&lt;br&gt;
Deployment Success Metrics&lt;br&gt;
Infrastructure Provisioning&lt;br&gt;
AWS EKS: 28 minutes (fully automated)&lt;br&gt;
Azure AKS: 22 minutes (fully automated)&lt;br&gt;
Success Rate: 100% (reproducible builds)&lt;br&gt;
Application Deployment&lt;br&gt;
Build Time: 3–5 minutes (multi-stage Docker build)&lt;br&gt;
Push to Registry: 1 minute&lt;br&gt;
Helm Deployment: 2 minutes&lt;br&gt;
Total CI/CD Duration: 8–10 minutes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application Performance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;| Metric | AWS EKS | Azure AKS | Target |&lt;/p&gt;

&lt;p&gt;| Availability | 99.9% | 99.9% | 99.5% |&lt;/p&gt;

&lt;p&gt;| Avg Response Time | 45ms | 52ms | &amp;lt;100ms |&lt;/p&gt;

&lt;p&gt;| P95 Latency | 89ms | 95ms | &amp;lt;200ms |&lt;/p&gt;

&lt;p&gt;| Throughput | 1000 req/s | 950 req/s | 500 req/s |&lt;/p&gt;

&lt;p&gt;| Error Rate | 0.01% | 0.01% | &amp;lt;1% |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource Utilization:&lt;/strong&gt;&lt;br&gt;
| Resource | Requested | Used (Avg) | Efficiency |&lt;/p&gt;

&lt;p&gt;| CPU | 250m | 45m | 18% |&lt;/p&gt;

&lt;p&gt;| Memory | 256Mi | 128Mi | 50% |&lt;/p&gt;

&lt;p&gt;Note: Low utilization is expected for this demo app. Production apps would scale based on actual load&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;0 Critical Vulnerabilities in production images&lt;br&gt;
0 High Severity IaC issues&lt;br&gt;
00% Secret Coverage (no hardcoded credentials)&lt;br&gt;
Pod Security standards enforced&lt;br&gt;
Network Policies implemented&lt;br&gt;
Testing Coverage&lt;br&gt;
Total Tests: 12&lt;br&gt;
Passed: 12&lt;br&gt;
Failed: 0&lt;br&gt;
Coverage: 85%&lt;br&gt;
CI/CD Metrics&lt;br&gt;
Build Success Rate : 98% (2 failures due to flaky tests)&lt;br&gt;
Average Build Time : 8 minutes&lt;br&gt;
Deployment Frequency**: On-demand (GitOps ready)&lt;br&gt;
Lead Time: &amp;lt; 15 minutes (code to production)&lt;br&gt;
MTTR: &amp;lt; 30 minutes (rollback capability)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Worked Well&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Infrastructure as Code
Terraform modules made multi-environment deployments trivial
Remote state management prevented conflicts
Destroy/recreate workflow enabled cost savings&lt;/li&gt;
&lt;li&gt;Helm for Kubernetes
Environment-specific values files simplified configuration
Version control for deployments
Easy rollback capabilities&lt;/li&gt;
&lt;li&gt;Multi-Stage Docker Builds
60% reduction in image size
Faster deployments
Better security (minimal attack surface)&lt;/li&gt;
&lt;li&gt;GitHub Actions
Native integration with GitHub
No additional CI/CD infrastructure needed
Secrets management built-in&lt;/li&gt;
&lt;li&gt;Spot Instances
70% cost savings on AWS compute
No noticeable impact on availability (for dev/test)
Challenges Faced
Terraform State Lock
Lesson: Always clean up failed applies, use DynamoDB lock table&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;EKS Node Group Deletion&lt;br&gt;
aws eks delete-nodegroup — cluster-name  — nodegroup-name &lt;/p&gt;

&lt;p&gt;Lesson : Understand resource dependencies&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACR Naming Restrictions&lt;/strong&gt;&lt;br&gt;
Azure Container Registry names must be lowercase alphanumeric.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I’d Do Differently&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Start with Local Kubernetes&lt;br&gt;
Use kind/minikube for initial development&lt;br&gt;
Only move to cloud for integration testing&lt;br&gt;
Would have saved 2 weeks of cloud costs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement GitOps Sooner&lt;br&gt;
ArgoCD or Flux for declarative deployments&lt;br&gt;
Better visibility into deployment state&lt;br&gt;
Automatic sync from Git&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Service Mesh Earlier&lt;br&gt;
Better traffic management&lt;br&gt;
Enhanced observability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More Comprehensive Monitoring&lt;br&gt;
Log aggregation with Loki from day 1&lt;br&gt;
Distributed tracing with Jaeger&lt;br&gt;
Custom application metrics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automated Cost Tracking&lt;br&gt;
Daily cost reports via AWS Cost Explorer API&lt;br&gt;
Budget alerts in Slack&lt;br&gt;
Dashboard showing spend by service&lt;br&gt;
Key Takeaways for DevOps Engineers&lt;br&gt;
Infrastructure as Code is Essential&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Version control your infrastructure&lt;br&gt;
Make it reproducible&lt;br&gt;
Destroy and recreate confidently&lt;br&gt;
Security is Not Optional&lt;/p&gt;

&lt;p&gt;Scan early and often&lt;br&gt;
Implement least privilege&lt;br&gt;
No secrets in code, ever&lt;br&gt;
Cost Awareness Matters&lt;/p&gt;

&lt;p&gt;Monitor spending from day 1&lt;br&gt;
Use spot instances for non-critical workloads&lt;br&gt;
Destroy what you don’t use&lt;br&gt;
Observability from the Start&lt;/p&gt;

&lt;p&gt;Logs, metrics, and traces&lt;br&gt;
You can’t improve what you can’t measure&lt;br&gt;
Dashboards tell stories&lt;br&gt;
Automation Saves Time&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;30 minutes to recreate infrastructure&lt;/li&gt;
&lt;li&gt;Consistent, repeatable deployments&lt;/li&gt;
&lt;li&gt;Focus on building, not clicking&lt;/li&gt;
&lt;li&gt;For Job Seekers&lt;/li&gt;
&lt;li&gt;This project demonstrates:&lt;/li&gt;
&lt;li&gt;Real-world DevOps practices&lt;/li&gt;
&lt;li&gt;Multi-cloud expertise&lt;/li&gt;
&lt;li&gt;Security-first mindset&lt;/li&gt;
&lt;li&gt;Cost optimization skills&lt;/li&gt;
&lt;li&gt;Problem-solving ability&lt;/li&gt;
&lt;li&gt;Documentation skills&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Portfolio Value: Shows you can build production-grade infrastructure, not just follow tutorials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources &amp;amp; Documentation&lt;/strong&gt;&lt;br&gt;
Project Repository&lt;br&gt;
🔗 &lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops" rel="noopener noreferrer"&gt;github.com/abidaslam892/multi-cloud-devsecops&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Documentation Files&lt;br&gt;
&lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops/blob/main/SETUP.md" rel="noopener noreferrer"&gt;Setup Guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops/blob/main/DEPLOY.md" rel="noopener noreferrer"&gt;Deployment Guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops/blob/main/ACCESS-GUIDE.md" rel="noopener noreferrer"&gt;Access Guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops/blob/main/COST-OPTIMIZATION.md" rel="noopener noreferrer"&gt;Cost Optimization&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops/blob/main/docs/monitoring-setup.md" rel="noopener noreferrer"&gt;Monitoring Setup&lt;/a&gt;&lt;br&gt;
Technologies Used&lt;br&gt;
&lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI Documentation&lt;/a&gt;&lt;br&gt;
&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs" rel="noopener noreferrer"&gt;Terraform AWS Provider&lt;/a&gt;&lt;br&gt;
&lt;a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs" rel="noopener noreferrer"&gt;Terraform Azure Provider&lt;/a&gt;&lt;br&gt;
&lt;a href="https://helm.sh/docs/" rel="noopener noreferrer"&gt;Helm Documentation&lt;/a&gt;&lt;br&gt;
&lt;a href="https://kubernetes.io/docs/" rel="noopener noreferrer"&gt;Kubernetes Documentation&lt;/a&gt;&lt;br&gt;
&lt;a href="https://prometheus.io/docs/" rel="noopener noreferrer"&gt;Prometheus Documentation&lt;/a&gt;&lt;br&gt;
&lt;a href="https://grafana.com/docs/" rel="noopener noreferrer"&gt;Grafana Documentation&lt;/a&gt;&lt;br&gt;
Tools &amp;amp; Security&lt;br&gt;
&lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy Scanner&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.checkov.io/" rel="noopener noreferrer"&gt;Checkov IaC Scanner&lt;/a&gt;&lt;br&gt;
&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;&lt;br&gt;
Connect With Me&lt;br&gt;
I’d love to hear your feedback, questions, or suggestions!&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/abidaslam892" rel="noopener noreferrer"&gt;@abidaslam892&lt;/a&gt;&lt;br&gt;
Repository: &lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops" rel="noopener noreferrer"&gt;multi-cloud-devsecops&lt;/a&gt;&lt;br&gt;
Email: &lt;a href="mailto:abidaslam.123@gmail.com"&gt;abidaslam.123@gmail.com&lt;/a&gt;&lt;br&gt;
LinkedIn: linkedin.com/in/abid-aslam-75520330&lt;br&gt;
Evidence &amp;amp; Screenshots&lt;br&gt;
See the &lt;a href="//.%20/evidence/"&gt;blog-materials/evidence&lt;/a&gt; folder for:&lt;/p&gt;

&lt;p&gt;AWS Console screenshots (EKS, ECR, VPC)&lt;br&gt;
Azure Portal screenshots (AKS, ACR)&lt;br&gt;
Grafana dashboards&lt;br&gt;
CI/CD pipeline runs&lt;br&gt;
Cost reports&lt;br&gt;
Security scan results&lt;br&gt;
Acknowledgments&lt;br&gt;
The open-source community for amazing tools&lt;br&gt;
Terraform AWS/Azure modules maintainers&lt;br&gt;
GitHub Actions team&lt;br&gt;
Everyone who contributed to the technologies used&lt;br&gt;
Final Thoughts&lt;br&gt;
Building this project taught me that **DevOps is not about tools, it’s about culture and practices&lt;/p&gt;

&lt;p&gt;Automate everything you can&lt;br&gt;
Treat infrastructure as code&lt;br&gt;
Security is everyone’s responsibility&lt;br&gt;
Monitor, measure, improve&lt;br&gt;
Share knowledge (hence this blog!)&lt;br&gt;
If you’re learning DevOps, I encourage you to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Build something real (not just tutorials)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make mistakes and learn from them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document your journey&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Share with the community&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Remember: The best way to learn is by doing. Start small, iterate, and keep building!&lt;/p&gt;

&lt;p&gt;Ifthis article helped you, please give it a ⭐ star on &lt;a href="https://github.com/abidaslam892/multi-cloud-devsecops" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and share it with others!&lt;/p&gt;

&lt;h1&gt;
  
  
  DevOps #AWS #Azure #Kubernetes #Terraform #CI/CD #CloudNative #Security #DevSecOps #MultiCloud #Docker #Helm #Prometheus #Grafana #Python #FastAPI #Infrastructure #Automation
&lt;/h1&gt;

&lt;p&gt;netes #Terraform #CI/CD #CloudNative #Security #DevSecOps #MultiCloud #Docker #Helm #Prometheus #Grafana #Python #FastAPI #Infrastructure #Automation&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>linux</category>
      <category>github</category>
    </item>
    <item>
      <title>Production Monitoring Made Easy: Prometheus, Grafana, and Docker Explained</title>
      <dc:creator>abidaslam892</dc:creator>
      <pubDate>Sat, 15 Nov 2025 15:42:56 +0000</pubDate>
      <link>https://forem.com/abidaslam892/production-monitoring-made-easy-prometheus-grafana-and-docker-explained-mj4</link>
      <guid>https://forem.com/abidaslam892/production-monitoring-made-easy-prometheus-grafana-and-docker-explained-mj4</guid>
      <description>&lt;p&gt;&lt;strong&gt;Original Post:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/design-bootcamp/production-monitoring-made-easy-prometheus-grafana-and-docker-explained-f373607102ed" rel="noopener noreferrer"&gt;https://medium.com/design-bootcamp/production-monitoring-made-easy-prometheus-grafana-and-docker-explained-f373607102ed&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From Zero to Observability: Building a Production-Grade Monitoring Stack with Prometheus &amp;amp; Grafana&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
In today’s cloud-native world, monitoring isn’t optional — it’s essential. Whether you’re running a small side project or managing enterprise infrastructure, you need visibility into your systems. But setting up monitoring shouldn’t require weeks of configuration and a PhD in DevOps.&lt;/p&gt;

&lt;p&gt;In this comprehensive guide, I’ll walk you through building a production-ready monitoring stack using three powerful open-source tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Docker for containerization&lt;/li&gt;
&lt;li&gt;Prometheus for metrics collection&lt;/li&gt;
&lt;li&gt;Grafana for visualization&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  By the end of this tutorial, you’ll have:
&lt;/h2&gt;

&lt;p&gt;A fully functional monitoring stack running in containers&lt;br&gt;
Real-time system metrics from your infrastructure&lt;br&gt;
Beautiful, interactive dashboards&lt;br&gt;
Knowledge to extend and customize for your needs&lt;br&gt;
Time to complete: 30 minutes&lt;br&gt;
Skill level: Beginner to Intermediate&lt;br&gt;
Prerequisites: Basic command-line knowledge, Docker installed&lt;br&gt;
Why This Stack?&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;The Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Traditional monitoring setups are often:&lt;/p&gt;

&lt;p&gt;Complex - Multiple services, complicated configurations&lt;br&gt;
Expensive - Enterprise solutions cost thousands per month&lt;br&gt;
Inflexible- Vendor lock-in limits customization&lt;br&gt;
Hard to scale - Difficult to add new metrics or exporters&lt;br&gt;
The Solution&lt;br&gt;
Our stack solves these problems:&lt;/p&gt;

&lt;p&gt;Simple - Deploy everything with one command&lt;br&gt;
Free &amp;amp; Open Source - No licensing costs&lt;br&gt;
Highly Customizable - Full control over metrics and dashboards&lt;br&gt;
Scalable - Easy to add exporters and federate Prometheus&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;_Here's what we're building: _&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Components: *&lt;/em&gt;&lt;br&gt;
Prometheus— Collects and stores time-series metrics&lt;br&gt;
Grafana — Creates beautiful dashboards and visualizations&lt;br&gt;
Node Exporter — Exposes system-level metrics (CPU, RAM, disk)&lt;br&gt;
Application Exporter — Custom metrics from your applications&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 1: Setting Up the Foundation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Prepare Your Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First, ensure you have Docker and Docker Compose installed:&lt;/p&gt;

&lt;p&gt;Check Docker version&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker — version&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Docker version 20.10.0 or higher required&lt;/p&gt;

&lt;p&gt;Check Docker Compose version&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker-compose — version&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Docker Compose version 2.20.0 or higher recommended&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2: Create Project Structure
&lt;/h2&gt;

&lt;p&gt;Create project directory&lt;/p&gt;

&lt;p&gt;mkdir monitoring-stack &amp;amp;&amp;amp; cd monitoring-stack&lt;/p&gt;

&lt;p&gt;Create necessary directories&lt;/p&gt;

&lt;p&gt;mkdir -p prometheus grafana src&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3: Configure Prometheus
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;prometheus/prometheus.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;global&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt; &lt;span class="c1"&gt;# Scrape targets every 15 seconds&lt;/span&gt;

&lt;span class="na"&gt;evaluation_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt; &lt;span class="c1"&gt;# Evaluate rules every 15 seconds&lt;/span&gt;

&lt;span class="na"&gt;scrape_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="s"&gt;Prometheus monitors itself&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;‘prometheus’&lt;/span&gt;

&lt;span class="na"&gt;static_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;‘localhost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;9090’&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;Node Exporter — System metrics&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;‘node_exporter’&lt;/span&gt;

&lt;span class="na"&gt;static_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;‘host.docker.internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;9100’&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;

&lt;span class="s"&gt;Custom application metrics&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;‘application’&lt;/span&gt;

&lt;span class="na"&gt;static_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;‘host.docker.internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;8000’&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;metrics_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;‘/metrics’&lt;/span&gt;

&lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;

&lt;span class="c1"&gt;## What’s happening here?&lt;/span&gt;

&lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;How often Prometheus collects metrics&lt;/span&gt;
&lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Logical grouping for targets&lt;/span&gt;
&lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Where to find metrics endpoints&lt;/span&gt;
&lt;span class="na"&gt;host.docker.internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allows containers to reach the host machine&lt;/span&gt;

&lt;span class="c1"&gt;## Part 2: Docker Compose Configuration&lt;/span&gt;

&lt;span class="na"&gt;Create `docker-compose.yml` in your project root&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
yaml&lt;/p&gt;

&lt;p&gt;version: ‘3.8’&lt;/p&gt;

&lt;p&gt;services:&lt;/p&gt;

&lt;p&gt;prometheus:&lt;/p&gt;

&lt;p&gt;image: prom/prometheus:latest&lt;/p&gt;

&lt;p&gt;container_name: prometheus&lt;/p&gt;

&lt;p&gt;ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“9091:9090”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;prometheus_data:/prometheus&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;‘ — config.file=/etc/prometheus/prometheus.yml’&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;‘ — storage.tsdb.path=/prometheus’&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;‘ — storage.tsdb.retention.time=30d’&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;restart: unless-stopped&lt;/p&gt;

&lt;p&gt;networks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;grafana:&lt;br&gt;
image: grafana/grafana:latest&lt;/p&gt;

&lt;p&gt;container_name: grafana&lt;/p&gt;

&lt;p&gt;ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“3000:3000”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GF_SECURITY_ADMIN_PASSWORD=admin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GF_USERS_ALLOW_SIGN_UP=false&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grafana_data:/var/lib/grafana&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;restart: unless-stopped&lt;/p&gt;

&lt;p&gt;networks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;depends_on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prometheus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;/p&gt;

&lt;p&gt;prometheus_data:&lt;/p&gt;

&lt;p&gt;grafana_data:&lt;/p&gt;

&lt;p&gt;networks:&lt;/p&gt;

&lt;p&gt;monitoring:&lt;/p&gt;

&lt;p&gt;driver: bridge&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Key Configuration Details

1. Ports: Prometheus on 9091, Grafana on 3000
2. Volumes: Persist data even if containers restart
3. Networks: Isolated bridge network for service communication
4. Retention: Keep metrics for 30 days
5. Restart Policy: Automatically restart on failure

## Part 3: Installing Node Exporter

Node Exporter provides system-level metrics. Install it on your host machine:
Create a dedicated user

sudo useradd — no-create-home — shell /bin/false node_exporter

Download Node Exporter

cd /tmp

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

Extract and install

tar xzf node_exporter-1.7.0.linux-amd64.tar.gz

sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create a systemd service at `/etc/systemd/system/node_exporter.service`:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
ini&lt;/p&gt;

&lt;p&gt;[Unit]&lt;/p&gt;

&lt;p&gt;Description=Node Exporter&lt;/p&gt;

&lt;p&gt;After=network.target&lt;/p&gt;

&lt;p&gt;[Service]&lt;/p&gt;

&lt;p&gt;User=node_exporter&lt;/p&gt;

&lt;p&gt;Group=node_exporter&lt;/p&gt;

&lt;p&gt;Type=simple&lt;/p&gt;

&lt;p&gt;ExecStart=/usr/local/bin/node_exporter&lt;/p&gt;

&lt;p&gt;[Install]&lt;/p&gt;

&lt;p&gt;WantedBy=multi-user.target&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Start the service:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;/p&gt;

&lt;p&gt;sudo systemctl daemon-reload&lt;/p&gt;

&lt;p&gt;sudo systemctl start node_exporter&lt;/p&gt;

&lt;p&gt;sudo systemctl enable node_exporter&lt;/p&gt;

&lt;p&gt;Verify it’s running&lt;/p&gt;

&lt;p&gt;curl &lt;a href="http://localhost:9100/metrics" rel="noopener noreferrer"&gt;http://localhost:9100/metrics&lt;/a&gt; | head -20&lt;/p&gt;

&lt;p&gt;You should see metrics output like:&lt;/p&gt;
&lt;h1&gt;
  
  
  HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
&lt;/h1&gt;
&lt;h1&gt;
  
  
  TYPE node_cpu_seconds_total counter
&lt;/h1&gt;

&lt;p&gt;node_cpu_seconds_total{cpu=”0",mode=”idle”} 12345.67&lt;/p&gt;

&lt;p&gt;node_cpu_seconds_total{cpu=”0",mode=”user”} 890.12&lt;/p&gt;

&lt;p&gt;Part 4: Creating a Custom Metrics Exporter&lt;br&gt;
Let’s create a simple Python application that exposes custom metrics.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;src/metrics_exporter.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;#!/usr/bin/env python3
&lt;/span&gt;
&lt;span class="err"&gt;“””&lt;/span&gt;

&lt;span class="c1"&gt;## Simple Prometheus Metrics Exporter
&lt;/span&gt;
&lt;span class="n"&gt;Demonstrates&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;instrument&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="n"&gt;applications&lt;/span&gt;

&lt;span class="err"&gt;“””&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;start_http_server&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Histogram&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psutil&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;Define&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;

&lt;span class="n"&gt;request_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;app_requests_total&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Total&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;active_users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;app_active_users&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Number&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;app_response_time_seconds&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;System&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;

&lt;span class="n"&gt;cpu_gauge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;system_cpu_percent&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="n"&gt;percentage&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;memory_gauge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;system_memory_percent&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="n"&gt;percentage&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;disk_gauge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;system_disk_percent&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;Disk&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="n"&gt;percentage&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;collect_system_metrics&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

&lt;span class="n"&gt;Collect&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="n"&gt;using&lt;/span&gt; &lt;span class="n"&gt;psutil&lt;/span&gt;

&lt;span class="n"&gt;cpu_gauge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_percent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;memory_gauge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;virtual_memory&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;percent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;disk_gauge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disk_usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;percent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_application_activity&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

&lt;span class="n"&gt;Simulate&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;demo&lt;/span&gt; &lt;span class="n"&gt;purposes&lt;/span&gt;

&lt;span class="n"&gt;methods&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;PUT&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;DELETE&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;endpoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;statuses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;Simulate&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;

&lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoints&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statuses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;request_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;Simulate&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;response_time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;Update&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;

&lt;span class="n"&gt;active_users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

&lt;span class="err"&gt;“””&lt;/span&gt;&lt;span class="n"&gt;Main&lt;/span&gt; &lt;span class="n"&gt;exporter&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="err"&gt;”””&lt;/span&gt;

&lt;span class="n"&gt;Start&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;

&lt;span class="n"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;

&lt;span class="nf"&gt;start_http_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="n"&gt;Metrics&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;localhost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)

Create `requirements.txt`:
Press enter or click to view image in full size

Create `start_exporter.sh
Press enter or click to view image in full size

Check if Python is installed

if ! command -v python3 &amp;amp;&amp;gt; /dev/null; then

echo “ Python 3 is not installed”

exit 1

fi

Install dependencies

pip3 install -r requirements.txt

Start the exporter

python3 src/metrics_exporter.py

Part 5: Launching the Stack
- Now we’re ready to start everything:
Start Prometheus and Grafana

docker-compose up -d

Check if containers are running

docker-compose ps

NAME IMAGE STATUS

grafana grafana/grafana:latest Up

prometheus prom/prometheus:latest Up

## Access your services

Prometheus: http://localhost:9091
Grafana: http://localhost:3000 (admin/admin)
Node Exporter Metrics: http://localhost:9100/metrics
Application Metrics: http://localhost:8000/metrics

## Part 6: Configuring Grafana

**Step 1: Add Prometheus as a Data Source**

Open Grafana at http://localhost:3000
Login with `admin` / `admin` (change password when prompted)
Go to Configuration → Data Sources
Click Add data source
Select Prometheus
Set URL: `http://prometheus:9090
Click Save &amp;amp; Test
You should see: “Data source is working”

**Step 2: Import a Dashboard**
Go to Dashboards → Import
Enter dashboard ID: 1860 (Node Exporter Full)
Click Load
Select Prometheus as the data source
Click Import
You now have a beautiful dashboard showing:
CPU usage across all cores
Memory utilization
Disk space and I/O
Network traffic
System load
Press enter or click to view image in full size

Press enter or click to view image in full size

Press enter or click to view image in full size

## Part 7: Creating Custom Dashboards
Let’s create a custom dashboard for our application metrics.

**Step 1: Create a New Dashboard**

Click **+** → **Create Dashboard**
Click **Add new panel**
Step 2: Add a Request Rate Panel

Query:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;p&gt;rate(app_requests_total[5m])&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Panel Settings:
Title: “HTTP Request Rate”
Visualization: Time series
Legend: `{{method}} {{endpoint}}`
Step 3: Add Active Users Panel
Query:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;p&gt;app_active_users&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Panel Settings:
Title: “Active Users”
Visualization: Stat
Color: Based on thresholds (green &amp;lt; 50, yellow &amp;lt; 80, red &amp;gt;= 80)
Step 4: Add Response Time Panel
Query:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;p&gt;histogram_quantile(0.95, rate(app_response_time_seconds_bucket[5m]))&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Panel Settings:
Title: “95th Percentile Response Time”
Visualization: Gauge
Unit: seconds
Step 5: Add CPU Usage Panel
Query:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;p&gt;100 — (avg(rate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Panel Settings:

Title: “CPU Usage %”
Visualization: Graph
Thresholds: Yellow at 60%, Red at 80%
Click Save dashboard and give it a name like “Application Monitoring”.
Part 8: Understanding PromQL
Prometheus Query Language (PromQL) is powerful. Here are essential queries:

## Press enter or click to view image in full size

Basic Queries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;p&gt;Get current value&lt;/p&gt;

&lt;p&gt;node_memory_MemTotal_bytes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Press enter or click to view image in full size&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Rate of change over 5 minutes&lt;/p&gt;

&lt;p&gt;rate(node_cpu_seconds_total[5m])&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Press enter or click to view image in full size&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Average across all instances&lt;/p&gt;

&lt;p&gt;avg(node_load1)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Press enter or click to view image in full size&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 9: Setting Up Alerts
&lt;/h2&gt;

&lt;p&gt;Alerts notify you when things go wrong. Let’s configure some.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;prometheus/alerts.yml&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Update &lt;code&gt;prometheus/prometheus.yml&lt;/code&gt; to include alerts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;global&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;

&lt;span class="na"&gt;evaluation_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;

&lt;span class="s"&gt;Load alert rules&lt;/span&gt;

&lt;span class="na"&gt;rule_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;‘/etc/prometheus/alerts.yml’&lt;/span&gt;

&lt;span class="na"&gt;scrape_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="c1"&gt;# … (existing scrape configs)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update &lt;code&gt;docker-compose.yml&lt;/code&gt; to mount the alerts file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="c1"&gt;# … (existing config)&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./prometheus/alerts.yml:/etc/prometheus/alerts.yml&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;prometheus_data:/prometheus&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Prometheus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
docker-compose restart prometheus

Check alerts at http://localhost:9091/alerts
Press enter or click to view image &lt;span class="k"&gt;in &lt;/span&gt;full size

&lt;span class="c"&gt;## Part 10: Production Best Practices&lt;/span&gt;

&lt;span class="k"&gt;**&lt;/span&gt;Security&lt;span class="k"&gt;**&lt;/span&gt;

&lt;span class="k"&gt;**&lt;/span&gt;1. Change Default Passwords&lt;span class="k"&gt;**&lt;/span&gt;

Update &lt;span class="sb"&gt;`&lt;/span&gt;docker-compose.yml&lt;span class="sb"&gt;`&lt;/span&gt;:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
yaml&lt;/p&gt;

&lt;p&gt;grafana:&lt;/p&gt;

&lt;p&gt;environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Create `.env` file:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;GRAFANA_PASSWORD=your_secure_password_here&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
**2. Use Read-Only Volumes**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
yaml&lt;/p&gt;

&lt;p&gt;volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
**3. Run as Non-Root User**

Resource Limits


Backup Strategy

Backup Prometheus data

docker run — rm \

-v prometheus_data:/data \

-v $(pwd)/backups:/backup \

alpine tar czf /backup/prometheus-$(date +%Y%m%d).tar.gz /data

Backup Grafana data

docker run — rm \

-v grafana_data:/data \

-v $(pwd)/backups:/backup \

alpine tar czf /backup/grafana-$(date +%Y%m%d).tar.gz /data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  High Availability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For production, consider:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prometheus Federation — Multiple Prometheus instances&lt;br&gt;
Thanos — Long-term storage and global view&lt;br&gt;
Grafana HA — Multiple Grafana instances behind load balancer&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 11: Troubleshooting Common Issues
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Issue 1: Container Won’t Start&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Check logs&lt;/span&gt;

docker-compose logs prometheus

docker-compose logs grafana

&lt;span class="c"&gt;# Common causes:&lt;/span&gt;

&lt;span class="c"&gt;# — Port already in use&lt;/span&gt;

&lt;span class="c"&gt;# — Configuration file syntax error&lt;/span&gt;

&lt;span class="c"&gt;# — Insufficient permissions&lt;/span&gt;

&lt;span class="k"&gt;**&lt;/span&gt;Issue 2: Grafana Can’t Connect to Prometheus&lt;span class="k"&gt;**&lt;/span&gt;

&lt;span class="k"&gt;**&lt;/span&gt;Problem: Data &lt;span class="nb"&gt;source test &lt;/span&gt;fails&lt;span class="k"&gt;**&lt;/span&gt;

Solution: Use container name, not localhost:

URL: http://prometheus:9090
URL: http://localhost:9091
Issue 3: No Metrics Showing

&lt;span class="c"&gt;# Check Prometheus targets&lt;/span&gt;

curl http://localhost:9091/api/v1/targets | jq

&lt;span class="c"&gt;# Verify exporters are reachable&lt;/span&gt;

curl http://localhost:9100/metrics

curl http://localhost:8000/metrics

&lt;span class="k"&gt;**&lt;/span&gt;Issue 4: Data Not Persisting&lt;span class="k"&gt;**&lt;/span&gt;

&lt;span class="c"&gt;# Check volume mounts&lt;/span&gt;

docker inspect prometheus | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 10 Mounts

&lt;span class="c"&gt;# Fix permissions (Prometheus runs as UID 65534)&lt;/span&gt;

&lt;span class="nb"&gt;sudo chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 65534:65534 prometheus_data/

&lt;span class="c"&gt;## Part 12: Extending Your Stack&lt;/span&gt;

Add MySQL Monitoring

Add Nginx Monitoring

Add Redis Monitoring

&lt;span class="c"&gt;## Part 13: Real-World Use Cases&lt;/span&gt;

&lt;span class="k"&gt;**&lt;/span&gt;Use Case 1: E-commerce Platform&lt;span class="k"&gt;**&lt;/span&gt;

Metrics to track:

1. Order processing rate
2. Payment gateway latency
3. Inventory stock levels
4. User cart abandonment rate

&lt;span class="k"&gt;**&lt;/span&gt;Sample custom metrics:&lt;span class="k"&gt;**&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;/p&gt;

&lt;p&gt;from prometheus_client import Counter, Histogram&lt;/p&gt;

&lt;p&gt;orders_total = Counter(‘orders_total’, ‘Total orders’, [‘status’])&lt;/p&gt;

&lt;p&gt;payment_duration = Histogram(‘payment_duration_seconds’, ‘Payment processing time’)&lt;/p&gt;

&lt;p&gt;inventory_stock = Gauge(‘inventory_stock’, ‘Product stock level’, [‘product_id’])&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Use Case 2: API Service
Metrics to track:

Request rate per endpoint
Response time percentiles
Error rates by status code
Rate limiting hits
PromQL Queries:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;
&lt;h1&gt;
  
  
  Requests per second by endpoint
&lt;/h1&gt;

&lt;p&gt;rate(api_requests_total[1m]) by (endpoint)&lt;/p&gt;
&lt;h1&gt;
  
  
  99th percentile latency
&lt;/h1&gt;

&lt;p&gt;histogram_quantile(0.99, rate(api_duration_seconds_bucket[5m]))&lt;/p&gt;
&lt;h1&gt;
  
  
  Error rate
&lt;/h1&gt;

&lt;p&gt;sum(rate(api_requests_total{status=~”5..”}[5m])) / sum(rate(api_requests_total[5m]))&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Use Case 3: Batch Processing Pipeline
Metrics to track:

Job completion time
Records processed per minute
Failed jobs count
Queue depth
Part 14: Performance Optimization
Optimize Prometheus Storage

Optimize Scrape Intervals

Use Recording Rules for Expensive Queries
Press enter or click to view image in full size

Then use the pre-computed metrics:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
promql&lt;/p&gt;

&lt;h1&gt;
  
  
  Instead of this expensive query:
&lt;/h1&gt;

&lt;p&gt;rate(api_requests_total[5m])&lt;/p&gt;

&lt;h1&gt;
  
  
  Use this:
&lt;/h1&gt;

&lt;p&gt;job:api_request_rate:5m&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


Conclusion
You’ve built a complete monitoring stack from scratch. Here’s what you’ve accomplished:

Deployed a containerized monitoring infrastructure
Configured Prometheus to collect metrics
Created beautiful Grafana dashboards
Instrumented a custom application
Set up alerts for critical issues
Learned PromQL for advanced queries
Applied production best practices
Key Takeaways
Docker makes deployment simple — One command starts everything
Prometheus is powerful — Time-series data with flexible querying
Grafana is beautiful — Create stunning, informative dashboards
Monitoring is essential — Know what’s happening in your systems
Start simple, extend gradually — Add exporters as you need them
Next Steps
Deploy to production — Use Docker Swarm or Kubernetes
Add more exporters — Monitor databases, message queues, etc.
Implement alerting — Connect to Slack, PagerDuty, or email
Long-term storage — Integrate Thanos for infinite retention
Advanced dashboards — Create business-specific metrics
Resources
GitHub Repository: (https://github.com/abidaslam892/Grafana-Prometheus-Monitoring-Deployment-)
Prometheus Docs: https://prometheus.io/docs/
Grafana Dashboards: https://grafana.com/grafana/dashboards/
PromQL Guide: https://prometheus.io/docs/prometheus/latest/querying/basics/
Docker Docs: https://docs.docker.com/
Questions?
Feel free to reach out in the comments below! I’d love to hear:

What are you monitoring?
What challenges did you face?
What metrics matter most to your business?
If this guide helped you, please:
- ⭐ Star the GitHub repository

- 👏 Clap for this article

- 🔗 Share with your team

- 💬 Leave a comment

Happy monitoring! 📊
#Docker #Prometheus #Grafana #DevOps #Monitoring #Kubernetes #CloudNative #SRE #Infrastructure #Tutorial
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>webdev</category>
      <category>productivity</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>DevOps Engineer to Cloud Architect</title>
      <dc:creator>abidaslam892</dc:creator>
      <pubDate>Fri, 14 Nov 2025 06:49:06 +0000</pubDate>
      <link>https://forem.com/abidaslam892/devops-engineer-to-cloud-architect-57cb</link>
      <guid>https://forem.com/abidaslam892/devops-engineer-to-cloud-architect-57cb</guid>
      <description>&lt;h2&gt;
  
  
  Originally published on Medium:
&lt;/h2&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://medium.com/towards-artificial-intelligence/devops-engineer-to-cloud-architect-8590efd51089" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Hello! I’m Abid Aslam, a DevOps Engineer and Cloud Solutions Architect with over 15 years of experience in telecom operations, infrastructure automation, and cloud computing. My journey through the Azure Resume Challenge wasn’t just about building a resume website — it was about showcasing the evolution of modern cloud architecture and demonstrating how traditional infrastructure expertise translates to cutting-edge cloud solutions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this comprehensive article, I’ll walk you through my complete Azure Resume Challenge experience, the unique approaches I took, and the advanced blog system I built to share my technical expertise with the community.&lt;/p&gt;

&lt;p&gt;What Motivated Me to Take the Azure Resume Challenge&lt;br&gt;
As someone who has spent over a decade managing critical telecom BSS (Business Support Systems) and building enterprise-grade infrastructure, I wanted to demonstrate how traditional IT operations expertise translates to modern cloud architecture. The Azure Resume Challenge provided the perfect platform to showcase:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modern Cloud Architecture&lt;/strong&gt;: Moving from traditional server management to serverless, scalable solutions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DevOps Best Practices&lt;/strong&gt;: Implementing CI/CD pipelines, Infrastructure as Code, and automated testing&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full-Stack Development:&lt;/strong&gt; Combining backend engineering with modern frontend experiences&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Writing&lt;/strong&gt;: Creating comprehensive documentation and sharing knowledge with the community&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Architecture Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My implementation goes beyond the basic challenge requirements, incorporating enterprise-grade patterns and advanced features:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Static Website Hosting&lt;/strong&gt;: Azure Storage with $web container&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom Domain&lt;/strong&gt;: Professional domain with SSL/TLS encryption&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content Delivery&lt;/strong&gt;: Azure Front Door for global performance&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Responsive Design&lt;/strong&gt;: Mobile-first approach with modern CSS&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Serverless Computing&lt;/strong&gt;: Azure Functions with Python 3.11&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database&lt;/strong&gt;: CosmosDB Table API for visitor counter persistence&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API Design&lt;/strong&gt; : RESTful endpoints with proper CORS handling&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring&lt;/strong&gt; : Built-in health checks and comprehensive logging&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DevOps Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source Control&lt;/strong&gt; : GitHub with organized repository structure&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI/CD&lt;/strong&gt; : GitHub Actions for automated deployment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure as Code&lt;/strong&gt; : ARM templates for reproducible deployments&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing&lt;/strong&gt;: Automated testing and validation workflows&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Building the Professional Frontend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Resume Website&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Abid Aslam&lt;br&gt;
CMPAK (Zong Pakistan Ltd) - Islamabad Lead SA &amp;amp; Project Manager for CRM / OCS / Rating / Mediation and Billing systems…&lt;br&gt;
&lt;a href="http://www.abidaslam.online" rel="noopener noreferrer"&gt;www.abidaslam.online&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I started with creating a clean, professional resume website that reflects modern design principles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;
&lt;span class="nt"&gt;&amp;lt;&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt; &lt;span class="na"&gt;features&lt;/span&gt; &lt;span class="na"&gt;of&lt;/span&gt; &lt;span class="na"&gt;my&lt;/span&gt; &lt;span class="na"&gt;resume&lt;/span&gt; &lt;span class="na"&gt;design&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt;

&lt;span class="na"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Responsive&lt;/span&gt; &lt;span class="na"&gt;layout&lt;/span&gt; &lt;span class="na"&gt;optimized&lt;/span&gt; &lt;span class="na"&gt;for&lt;/span&gt; &lt;span class="na"&gt;all&lt;/span&gt; &lt;span class="na"&gt;devices&lt;/span&gt;

&lt;span class="na"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Professional&lt;/span&gt; &lt;span class="na"&gt;typography&lt;/span&gt; &lt;span class="na"&gt;using&lt;/span&gt; &lt;span class="na"&gt;Inter&lt;/span&gt; &lt;span class="na"&gt;font&lt;/span&gt; &lt;span class="na"&gt;family&lt;/span&gt;

&lt;span class="na"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Semantic&lt;/span&gt; &lt;span class="na"&gt;HTML&lt;/span&gt; &lt;span class="na"&gt;structure&lt;/span&gt; &lt;span class="na"&gt;for&lt;/span&gt; &lt;span class="na"&gt;accessibility&lt;/span&gt;

&lt;span class="na"&gt;-&lt;/span&gt; &lt;span class="na"&gt;CSS&lt;/span&gt; &lt;span class="na"&gt;Grid&lt;/span&gt; &lt;span class="na"&gt;and&lt;/span&gt; &lt;span class="na"&gt;Flexbox&lt;/span&gt; &lt;span class="na"&gt;for&lt;/span&gt; &lt;span class="na"&gt;modern&lt;/span&gt; &lt;span class="na"&gt;layouts&lt;/span&gt;

&lt;span class="na"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Smooth&lt;/span&gt; &lt;span class="na"&gt;animations&lt;/span&gt; &lt;span class="na"&gt;and&lt;/span&gt; &lt;span class="na"&gt;transitions&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The website showcases my 15+ years of experience in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Telecom Operations and BSS Systems&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DevOps and Infrastructure Automation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cloud Architecture and Migration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubernetes and Container Orchestration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring and Observability Solutions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advanced Blog System&lt;br&gt;
What sets my implementation apart is the comprehensive technical blog I built as part of the project. The blog features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;9 In-Depth Technical Articles&lt;/strong&gt; covering:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Azure Resume Challenge complete walkthrough&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DevOps automation with GitHub Actions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubernetes security and RBAC implementation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Azure Functions best practices&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Terraform infrastructure patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evolution from traditional monitoring to modern observability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Legacy application containerization strategies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced troubleshooting methodologies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enterprise-grade deployment patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive Modal System&lt;/strong&gt;: Professional reading experience with JavaScript-powered modals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Technical Code Examples&lt;/strong&gt;: Real-world code snippets and architecture diagrams&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Responsive Design&lt;/strong&gt;: Optimized for desktop and mobile reading&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 2: Serverless Backend Implementation&lt;br&gt;
Azure Functions Architecture&lt;br&gt;
The backend implementation uses Azure Functions with several advanced features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Core visitor counter implementation
&lt;/span&gt;
&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="n"&gt;visitor&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;OPTIONS&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="n"&gt;auth_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AuthLevel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ANONYMOUS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;visitor_counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HttpRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="err"&gt;“””&lt;/span&gt;

&lt;span class="n"&gt;Azure&lt;/span&gt; &lt;span class="n"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt; &lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;visitor&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;

&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Returns&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="n"&gt;visitor&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Increments&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;returns&lt;/span&gt; &lt;span class="n"&gt;visitor&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;OPTIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CORS&lt;/span&gt; &lt;span class="n"&gt;preflight&lt;/span&gt; &lt;span class="n"&gt;support&lt;/span&gt;

&lt;span class="err"&gt;“””&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Database Integration&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Using CosmosDB Table API for reliable, scalable data persistence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Advanced table storage management
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TableStorageManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

&lt;span class="c1"&gt;# Connection string and key-based authentication
&lt;/span&gt;
&lt;span class="c1"&gt;# Proper error handling and logging
&lt;/span&gt;
&lt;span class="c1"&gt;# Retry logic and connection pooling
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;API Endpoints&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Three comprehensive endpoints providing different functionality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;/api/visitor-counter&lt;/code&gt;: Main counter functionality&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;/api/visitor-stats&lt;/code&gt;: Detailed analytics and metadata&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;/api/health&lt;/code&gt;: Health monitoring and diagnostics&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 3: Infrastructure as Code&lt;/p&gt;
&lt;h3&gt;
  
  
  Azure Resource Management
&lt;/h3&gt;

&lt;p&gt;Implemented using ARM templates for complete infrastructure automation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“parameters”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“storageAccountName”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“type”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“string”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“metadata”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“description”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;storage&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;account&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;website”&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“functionAppName”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“type”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“string”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“metadata”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“description”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Azure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;App”&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Organization
&lt;/h2&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;-Resource Groups: Logical organization of all Azure resources&lt;/p&gt;

&lt;p&gt;Storage Accounts: Static website hosting with CDN integration&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Function Apps: Serverless compute with auto-scaling&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;CosmosDB : Global distribution with Table API&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Application Insights: Comprehensive monitoring and analytics&lt;br&gt;
Phase 4: Advanced CI/CD Pipeline&lt;br&gt;
GitHub Actions Workflows&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Implemented separate workflows for frontend and backend deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# Frontend deployment workflow&lt;/span&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Frontend&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;‘*.html’&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;‘*.css’&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;‘*.js’&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload to Azure Storage&lt;/span&gt;

&lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure/ — — -&lt;/span&gt;

&lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;azcliversion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2.30.0&lt;/span&gt;

&lt;span class="na"&gt;inlineScript&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;

&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="s"&gt;z storage blob upload-batch \&lt;/span&gt;

&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="s"&gt; account-name ${{ secrets.AZURE_STORAGE_ACCOUNT }} \&lt;/span&gt;

&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="s"&gt; destination ‘$web’ \&lt;/span&gt;

&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="s"&gt; source . \&lt;/span&gt;

&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="s"&gt; auth-mode key&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment Strategy&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Blue-Green Deployments: Zero-downtime updates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automated Testing : Integration and unit tests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Environment Management: Separate dev/staging/production environments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rollback Capabilities: Automated rollback on failure&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 5: Advanced Monitoring and Observability
&lt;/h2&gt;

&lt;p&gt;Application Insights Integration&lt;/p&gt;

&lt;p&gt;Comprehensive monitoring covering:&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Application Performance: Response times, throughput, error rates&lt;br&gt;
Infrastructure Metrics : CPU, memory, storage utilization&lt;br&gt;
Custom Metrics : Business-specific KPIs and analytics&lt;br&gt;
Distributed Tracing : End-to-end request tracking&lt;br&gt;
Health Monitoring&lt;br&gt;
Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Press enter or click to view image in full size&lt;/p&gt;

&lt;p&gt;Built-in health checks providing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“status”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“healthy”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“timestamp”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="err"&gt;–&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="err"&gt;–&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="err"&gt;T&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="mf"&gt;9.286391&lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“services”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“table_storage”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“connected”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“function_app”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“running”&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Challenges Overcome&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Azure Functions Deployment Issues&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Challenge : Initial deployment conflicts with WEBSITE_RUN_FROM_PACKAGE settings&lt;/p&gt;

&lt;p&gt;Solution: Implemented proper deployment configuration management and environment variable handling&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CORS Configuration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Challenge: Cross-origin requests between static website and Azure Functions&lt;/p&gt;

&lt;p&gt;Solution : Comprehensive CORS handling in function code with proper preflight support&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CosmosDB Integration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Challenge : Connecting Azure Functions to CosmosDB Table API with proper authentication&lt;/p&gt;

&lt;p&gt;Solution : Implemented multiple authentication methods with fallback strategies&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Performance Optimization&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Challenge: Ensuring fast loading times and smooth user experience&lt;/p&gt;

&lt;p&gt;Solution : CDN integration, image optimization, and efficient caching strategies&lt;/p&gt;

&lt;p&gt;Technical Skills Demonstrated&lt;br&gt;
Through this project, I’ve showcased expertise in:&lt;/p&gt;

&lt;p&gt;Cloud Architecture&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Serverless Computing: Azure Functions for scalable backend services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage Solutions: Azure Storage for static hosting and CosmosDB for data persistence&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CDN Integration: Azure Front Door for global content delivery&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security: SSL/TLS, CORS, and proper authentication mechanisms&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DevOps Practices&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Infrastructure as Code: ARM templates for reproducible deployments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI/CD Pipelines : GitHub Actions with comprehensive testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring: Application Insights and custom health checks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation: Comprehensive README files and technical documentation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Development Excellence&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Full-Stack Development: Modern HTML/CSS/JavaScript with Python backend&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API Design: RESTful services with proper error handling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database Management: NoSQL design patterns and data modeling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance Optimization: Efficient caching and content delivery&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advanced Blog Content Creation&lt;br&gt;
One of the unique aspects of my implementation is the comprehensive technical blog featuring 9 detailed articles:&lt;/p&gt;

&lt;p&gt;Featured Articles&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;”Building the Perfect Azure Resume Challenge” — Complete walkthrough with architecture decisions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”DevOps Mastery: GitHub Actions for Azure Deployment” — Advanced CI/CD patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Kubernetes Security Hardening” — Enterprise RBAC and security best practices&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Azure Functions Best Practices”— Performance, security, and scalability patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Infrastructure as Code with Terraform”— AWS enterprise deployment strategies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Monitoring Evolution: From Zabbix to Modern Observability”— 15+ years of monitoring experience&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Containerizing Legacy Telecom Applications” — Real-world modernization strategies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Advanced Troubleshooting Methodologies”— Systematic problem-solving approaches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;”Enterprise Deployment Patterns”— Production-ready architecture patterns&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each article includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Real-world code examples&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Architecture diagrams&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Best practices and lessons learned&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Common pitfalls and solutions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance optimization techniques&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future Enhancements
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Phase 6: Advanced Analytics
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;User Behavior Tracking : Detailed analytics on resume page interactions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A/B Testing: Continuous optimization of user experience&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 7: Multi-Cloud Integration&lt;br&gt;
AWS Integration : Cross-cloud deployment strategies&lt;br&gt;
Hybrid Architecture: On-premises integration patterns&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways and Lessons Learned
&lt;/h2&gt;

&lt;p&gt;Technical Insights&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Serverless Architecture: Azure Functions provide excellent scalability and cost-effectiveness&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Static Site Hosting: Azure Storage offers robust, high-performance hosting for static content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI/CD Integration: GitHub Actions seamlessly integrates with Azure services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring Importance: Comprehensive observability is crucial for production systems&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Professional Growth&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Full-Stack Skills: Combining backend expertise with modern frontend development&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cloud Architecture: Designing scalable, resilient cloud-native solutions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Technical Writing : Sharing knowledge through comprehensive documentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Community Engagement: Contributing to the broader DevOps and cloud communities&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Project Results and Impact&lt;br&gt;
Website Performance&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Performance Metrics: Sub-second load times globally&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security: Zero security incidents with proper SSL/TLS implementation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Professional Impact&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Portfolio Showcase: Comprehensive demonstration of cloud and DevOps expertise&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Knowledge Sharing : Technical blog serving the community&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Career Development: Enhanced profile for cloud architecture opportunities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Industry Recognition: Contributing to Azure Resume Challenge community&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conclusion&lt;br&gt;
The Azure Resume Challenge has been far more than just building a resume website — it’s been a comprehensive journey through modern cloud architecture, DevOps best practices, and technical leadership. Through this project, I’ve demonstrated how 15+ years of traditional IT operations expertise translates to cutting-edge cloud solutions.&lt;/p&gt;

&lt;p&gt;The combination of professional resume presentation, advanced technical blog content, and enterprise-grade architecture showcases the evolution from traditional infrastructure management to modern cloud-native solutions.&lt;/p&gt;

&lt;p&gt;Key Achievements&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Complete Azure Architecture: Serverless backend, CDN-enabled frontend, and NoSQL database&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced CI/CD Pipeline: Automated deployment with comprehensive testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Professional Presentation: Modern, responsive design optimized for all devices&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enterprise Patterns: Scalable, maintainable, and secure architecture&lt;br&gt;
What’s Next&lt;br&gt;
I’m excited to continue evolving this platform, adding advanced analytics, multi-cloud integration, and AI-powered features. The Azure Resume Challenge has provided an excellent foundation for showcasing cloud expertise and contributing to the broader DevOps community.&lt;/p&gt;

&lt;p&gt;For fellow engineers considering the Azure Resume Challenge, I highly recommend it as a comprehensive way to demonstrate your cloud skills while building something genuinely useful for your career.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;Visit my live Azure Resume Challenge implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resume Website : &lt;a href="https://www.abidaslam.online/" rel="noopener noreferrer"&gt;https://www.abidaslam.online/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Connect with me:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;LinkedIn:&lt;a href="https://www.linkedin.com/in/abid-aslam-75520330/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/abid-aslam-75520330/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Email : &lt;a href="mailto:abidaslam.123@gmail.comm"&gt;abidaslam.123@gmail.comm&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub: abidaslam892&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thank you for joining me on this cloud journey! I look forward to connecting with fellow cloud enthusiasts and sharing more advanced technical content.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;This article represents my personal experience with the Azure Resume Challenge and includes advanced patterns developed through 15+ years of enterprise IT operations and cloud architecture experience.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>devops</category>
      <category>azure</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
