Forem: Prajwol Adhikari

Part 2: Infrastructure as Code with Terraform, OIDC, and a GitOps Pipeline

Prajwol Adhikari — Sun, 10 May 2026 01:01:53 +0000

Introduction

In Part 1, I built a security-gated CI/CD pipeline for my portfolio site — Gitleaks, CodeQL, Lighthouse audits, and secretless OIDC deployment to GitHub Pages. That pipeline was about code delivery. Push code, run checks, deploy the site.

But the whole time I was building that pipeline, the infrastructure underneath it — the DNS records, the cloud servers, the network configuration — was still managed by hand. I would log into Cloudflare, click around to add a DNS record. Log into Oracle Cloud, click through a wizard to resize an instance. If something broke, I would try to remember what I had changed and where.

That is fine when you have two or three things to manage. I had thirteen DNS records across multiple subdomains, a Cloudflare Tunnel configuration, an Oracle Cloud VCN with a subnet and a compute instance, and an AWS S3 bucket holding my Terraform state. Keeping track of all of that by clicking through dashboards was starting to feel like a job I was doing badly.

So Part 2 is about bringing all of that under code. Every DNS record, every cloud resource, defined in Terraform files, stored in GitHub, and deployed through a pipeline. No more dashboard clicking. No more "wait, did I change that setting last week or was it always like that?"

This one took longer than Part 1. There were more moving parts, more credentials to manage, and a migration that I was genuinely nervous about. But it is done, and my infrastructure is now as version-controlled as my code.

See it live

The Lab — live infrastructure status and build progress tracker
Architecture diagram — five-zone infrastructure map spanning Waco TX, Phoenix AZ, and Amsterdam NL

What is Infrastructure as Code and why should you care?

The concept is simple: instead of configuring infrastructure by clicking through web dashboards, you write code that describes what you want to exist. Then a tool reads that code and creates it for you.

The code becomes your documentation. If someone asks "what DNS records does your domain have?", you do not need to log into Cloudflare and screenshot the dashboard. You point them at a file. If you need to recreate everything from scratch — disaster recovery, new environment, new cloud account — you run one command instead of spending a day clicking through consoles trying to remember every setting.

But the part that really sold me on it was the diff. When you change a Terraform file and run terraform plan, it shows you exactly what will change before anything happens. "I am going to add this DNS record, modify this subnet rule, and leave everything else alone." You review it, confirm it, and only then does it apply. Compare that to clicking "Save" in a dashboard and hoping you did not just break something.

In my day job at AbbVie, we do not make changes to production systems without documentation and review. That is what cGMP requires. Terraform brings that same discipline to infrastructure — every change is tracked, reviewed, and auditable.

Chapter 1: The Module Structure

Before writing any Terraform, I had to decide how to organize the code. Terraform lets you put everything in one big file, but that gets messy fast when you are managing resources across multiple cloud providers.

I went with a modular structure:

homelab-iac/
├── backend.tf              # Where Terraform stores its state
├── main.tf                 # Calls the modules, passes variables
├── variables.tf            # Declares all input variables
├── terraform.tfvars        # Actual secret values (gitignored)
├── modules/
│   ├── cloudflare/
│   │   ├── main.tf         # All 13 DNS record resources
│   │   └── variables.tf    # Cloudflare-specific variables
│   └── oracle/
│       ├── main.tf         # VCN, subnet, compute instance
│       └── variables.tf    # OCI-specific variables
└── .github/
    └── workflows/
        ├── terraform-plan.yml
        └── terraform-apply.yml

The idea is separation of concerns. The Cloudflare module knows how to manage DNS records. The Oracle module knows how to manage cloud infrastructure. main.tf connects them. If I add a third cloud provider later, I add a third module without touching the existing ones.

The terraform.tfvars file contains the actual secret values — API tokens, OCIDs, private keys. It is in .gitignore and never gets committed to GitHub. More on how the pipeline handles this later.

Chapter 2: Cloudflare DNS — 13 Records as Code

Before Terraform, my DNS setup was a collection of manually created records in the Cloudflare dashboard. I knew roughly what they all did, but I could not have listed all thirteen from memory. Converting them to code forced me to actually understand each one.

Here is what I am managing:

Portfolio site: Two CNAME records pointing the root domain and www subdomain to Netlify, where my Hugo site is hosted.

Cloudflare Tunnel records: Seven CNAME records, one for each service I expose through the tunnel — Grafana, Prometheus, AdGuard, Homer, n8n, Nginx Proxy Manager, and cAdvisor. Each one points to the tunnel ID so traffic routes through Cloudflare's edge network instead of hitting my home IP directly.

Email authentication: Three records — SPF, DKIM, and DMARC. These are TXT records that prove to email servers that mail claiming to come from my domain is legitimate. I do not actually send email from this domain, but having these records prevents someone else from spoofing it.

GitHub Pages verification: A TXT record that proves to GitHub that I own the domain, required for the custom domain configuration on GitHub Pages.

A single DNS record in Terraform looks like this:

resource "cloudflare_record" "tunnel_grafana" {
  zone_id = var.cloudflare_zone_id
  name    = "grafana"
  content = "${var.tunnel_id}.cfargotunnel.com"
  type    = "CNAME"
  proxied = true
  ttl     = 1
}

That is it. Six lines. If I need to change the Grafana subdomain or point it somewhere else, I change one line of code, open a pull request, review the plan, and merge. The pipeline applies it automatically.

The Cloudflare provider authenticates using an API token scoped to Zone:Zone:Read and Zone:DNS:Edit on my specific zone only. Least privilege — the token cannot touch anything outside my domain.

Chapter 3: Oracle Cloud — The Network and the Server

The Oracle module manages three resources that form a complete cloud environment:

VCN (Virtual Cloud Network): A private network inside Oracle Cloud. Think of it as creating your own isolated LAN in the cloud. Resources inside the VCN can talk to each other, but the outside world cannot reach them unless you explicitly allow it.

Subnet: A subdivision of the VCN that defines the IP address range and routing rules. My compute instance sits in this subnet.

Compute Instance: The actual virtual machine — an Ampere ARM instance with 4 OCPUs and 24GB of RAM, running Ubuntu 24.04. This is my cloud server in Phoenix, Arizona. It runs AdGuard Home for DNS ad-blocking, Ollama with DeepSeek-R1 for the AI log summarizer, and Node Exporter for monitoring.

All three resources are defined in modules/oracle/main.tf. If Oracle ever reclaims the instance (it happens on the Always Free tier), I can recreate the entire network and server by running the pipeline. Everything comes back exactly as defined — same VCN, same subnet, same instance shape and configuration.

One thing Terraform does not manage here is what runs inside the instance. Docker, Ollama, AdGuard — those were all set up manually via SSH. Terraform creates the machine. Configuring what is on it is a different tool's job — Ansible, probably, in a future phase.

Chapter 4: Remote State — And Why It Matters More Than You Think

When Terraform creates a resource, it writes a record of that resource to a state file. The state file is how Terraform knows what already exists, so it can figure out what needs to change on the next run.

By default, the state file lives on your local machine. That works for one person on one laptop, but it has two serious problems:

If your laptop dies, you lose the state. Terraform no longer knows what exists. You either import every resource manually or start over.
If two people (or two pipeline runs) execute Terraform at the same time, they can corrupt the state by writing to it simultaneously. I solved both problems by storing state remotely in AWS S3:

terraform {
  backend "s3" {
    bucket       = "your-terraform-state-bucket"
    key          = "homelab/terraform.tfstate"
    region       = "us-east-1"
    use_lockfile = true
    encrypt      = true
  }
}

The state file lives in an S3 bucket, encrypted at rest. If my local server catches fire, the state is safe in AWS.

The use_lockfile = true line enables S3-native state locking. When Terraform runs, it creates a .tflock file in the bucket. If a second process tries to run simultaneously, it sees the lock and waits. No corruption possible.

I originally used a DynamoDB table for state locking — that was the standard approach for years. But Terraform 1.10 introduced S3-native locking, and as of 1.11 the DynamoDB approach is deprecated. I migrated by changing one line in backend.tf, running terraform init -reconfigure, and deleting the DynamoDB table. The whole migration took about five minutes and simplified my AWS footprint.

Chapter 5: The Pipeline — Plan on PR, Apply on Merge

Having Terraform code in GitHub is nice. Having it automatically validate and deploy is the real goal.

I created two GitHub Actions workflows:

terraform-plan.yml triggers on every pull request to main. It runs terraform fmt -check (is the code formatted correctly?), terraform validate (is the syntax valid?), and terraform plan (what would change?). If any step fails, the PR is blocked.

terraform-apply.yml triggers when code is merged to main. It runs terraform apply -auto-approve, actually making the infrastructure changes.

The plan workflow is the review step. When I open a PR that adds a DNS record, the plan output shows exactly what will be created. I read it, confirm it looks right, and merge. The apply workflow does the rest.

This is the same GitOps pattern used by platform engineering teams at companies much larger than my homelab. Git is the source of truth. Every change goes through a PR, gets validated by the pipeline, and is applied automatically on merge. The Git history becomes an audit log of every infrastructure change.

Chapter 6: OIDC — The Part That Changed How I Think About Credentials

In Part 1, I used OIDC to deploy to GitHub Pages without stored tokens. In Part 2, I used the same concept for something more complex — authenticating GitHub Actions to AWS.

The old way would be to create an AWS access key and secret key, store them in GitHub Secrets, and reference them in the workflow. Those keys never expire. If someone compromises your repo or your secrets leak, they have permanent access to your AWS account.

OIDC flips this around. I configured three things in AWS:

An OIDC Identity Provider — tells AWS "I know what GitHub Actions is and I trust their identity tokens."
An IAM Role with a trust policy scoped to my specific repo — tells AWS "only workflows running from <your-username>/<your-repo> can assume this role."
An inline policy — tells AWS "this role can only read and write to the S3 state bucket. Nothing else." When the pipeline runs, GitHub generates a short-lived token proving it is a workflow from my repo. AWS verifies the token, checks it against the trust policy, and hands back temporary credentials that expire in one hour. The pipeline uses those credentials, finishes its work, and the credentials disappear.

No permanent keys. Nothing stored in secrets. Nothing to rotate. Nothing to leak.

The workflow step is surprisingly simple:

- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
    aws-region: us-east-1

The only secret stored in GitHub is the ARN (Amazon Resource Name) of the IAM role — which is not a credential. It is just an identifier. The actual authentication happens through the OIDC handshake at runtime.

This was the piece that genuinely shifted how I think about credential management. In my previous work, I had always treated API keys as "generate once, store somewhere, hope nobody finds them." OIDC eliminates the "hope" part entirely.

Chapter 7: Handling Secrets in the Pipeline

The pipeline needs credentials for three providers — AWS, Cloudflare, and Oracle Cloud. Each one is handled differently.

AWS: OIDC, as described above. No stored credentials.

Cloudflare: An API token stored as a GitHub Secret (CLOUDFLARE_API_TOKEN). The token is scoped to Zone:Zone:Read and Zone:DNS:Edit on my specific zone. I rotated the token during this setup — generated a new one, added it to GitHub Secrets, revoked the old one.

Oracle Cloud: Multiple values stored as GitHub Secrets — tenancy OCID, user OCID, fingerprint, region, compartment ID, SSH public key, and a private key. The tricky one was the private key. On my local machine, Terraform reads it from a file at ~/.oci/oci_api_key.pem. That file does not exist in GitHub Actions. So the workflow writes the key content from the secret to a temporary file before Terraform runs:

- name: Write OCI private key
  run: |
    mkdir -p ~/.oci
    echo "${{ secrets.OCI_PRIVATE_KEY }}" > ~/.oci/oci_api_key.pem
    chmod 600 ~/.oci/oci_api_key.pem

The chmod 600 ensures only the current user can read the key — same as on your local machine. The file exists only for the duration of the workflow run and is destroyed when the runner is cleaned up.

All the secret values are passed to Terraform using the TF_VAR_ prefix convention. Terraform automatically reads any environment variable starting with TF_VAR_ and maps it to the corresponding variable:

env:
  TF_VAR_cloudflare_api_token: ${{ secrets.CLOUDFLARE_API_TOKEN }}
  TF_VAR_oci_tenancy_ocid: ${{ secrets.OCI_TENANCY_OCID }}

This means terraform.tfvars is only needed locally. The pipeline gets its values from GitHub Secrets and environment variables instead.

Chapter 8: Branch Protection — Closing the Loop

A pipeline is only as strong as the rules that enforce it. Without branch protection, nothing stops you from pushing directly to main at midnight and bypassing all the checks.

I created a ruleset on the repository:

Require a pull request before merging — no direct pushes to main
Require the plan status check to pass — merge is blocked until Terraform plan succeeds
Block force pushes — no rewriting history on main Now the only way to change infrastructure is: branch → commit → push → open PR → plan runs → review → merge → apply runs. No shortcuts. Not even for the repo owner.

It felt slightly paranoid to lock myself out of my own main branch. But then I remembered that the one time I would want to bypass the pipeline is exactly the time I should not — late at night, tired, "just this one quick fix." The branch protection is there for that version of me.

Chapter 9: The Migration That Made Me Nervous

Everything I have described so far was building something new. But there was one part that involved changing something that already existed — migrating from DynamoDB state locking to S3-native locking.

The state file is the single most important file in a Terraform setup. If it gets corrupted or lost, Terraform loses track of every resource it manages. You do not casually mess with how the state file is stored.

The actual migration was anticlimactic. I changed one line in backend.tf:

-    dynamodb_table = "terraform-lock-table"
+    use_lockfile   = true

Ran terraform init -reconfigure. Ran terraform plan. It showed no changes — meaning Terraform could still read the state and nothing had drifted. I deleted the DynamoDB table in AWS. Done.

But the fact that I was nervous about it taught me something about infrastructure work. The migration itself took five minutes. The caution I felt — checking the plan output twice, making sure I could roll back — that is the right instinct. In production, you do not rush infrastructure changes just because the technical step is simple.

What I Took Away From This

Part 1 taught me CI/CD. Part 2 taught me that the same principles — version control, automated validation, review before deploy — apply to infrastructure just as well as they apply to code.

The specific tools matter less than the pattern. Terraform could be replaced by Pulumi or OpenTofu. GitHub Actions could be replaced by GitLab CI or CircleCI. S3 could be replaced by GCS or Azure Blob Storage. The pattern stays the same: define infrastructure in code, store the code in version control, validate changes automatically, deploy through a pipeline, and never make changes by hand.

The part I am most proud of is the OIDC setup. Not because it was technically difficult — it was about an hour of work — but because it represents a genuine shift in how I think about security. Moving from "store a key and hope it does not leak" to "there is no key to leak" is the kind of change that sticks with you.

Building this also made me realize how much of DevOps is about discipline, not tooling. The pipeline does not do anything I could not do manually. But it does it the same way every time, it does it on every change without exception, and it leaves a record. That consistency is the actual value.

What is Next?

Part 3 brings Kubernetes into the homelab. I will be setting up a K3s cluster with my local server as the control plane node and the Oracle Cloud instance as a worker node — a geographically distributed cluster connected over Tailscale. Same discipline: infrastructure as code, pipeline-driven, documented.

The container orchestration layer is where everything built in Parts 1 and 2 starts to converge. The CI/CD pipeline from Part 1 will build and push container images. The Terraform infrastructure from Part 2 will provision the nodes. Kubernetes will run the workloads.

Stay tuned, and happy building.

Part 1: Building a Security-Gated CI/CD Pipeline with GitHub Actions

Prajwol Adhikari — Sun, 10 May 2026 00:58:23 +0000

Introduction

If you have followed along with the homelab series, you have seen me build a Debian server from scratch, lock it down with Zero Trust tunnels, and set up high-availability DNS across two continents. The infrastructure side has been a lot of fun to learn.

But there was something that had been nagging me. Every time I pushed code to this portfolio, it deployed automatically with zero checks. No secret scanning, no security analysis, no performance gates. One bad push and the site could break — or worse, I could accidentally leak a token and not even know it.

As I have been working toward a DevOps engineering role, CI/CD pipelines have been one of those topics I kept reading about but had not actually built from scratch myself. I am the kind of person who learns best by doing — reading documentation only gets me so far. I wanted to actually build something real, something I could showcase, and something that would teach me the tools that professional engineering teams use every day.

So that is what this post is about. Building a proper multi-stage, security-gated CI/CD pipeline using GitHub Actions — not because a tutorial told me to, but because I wanted to understand how it actually works.

Fair warning: it did not all go smoothly. There were failed runs, confusing errors, and at least one moment where I had no idea why the build was failing. I will walk you through all of it.

See it live

This post documents the first phase of a five-phase hybrid cloud
engineering showcase. If you want to see the current state of the
infrastructure before reading the build walkthrough:

The Lab — live infrastructure status, technology stack, and build progress tracker
Architecture diagram — five-zone infrastructure map spanning Waco TX, Phoenix AZ, and Amsterdam NL

The lab page updates as each phase completes. By the time you read
this, Phase 2 Terraform IaC may already be marked complete.

What is CI/CD and why does it matter?

Before I built this, I had a loose understanding of CI/CD. "Code goes in, site comes out automatically." That is technically true but it misses the point.

CI (Continuous Integration) means every push automatically triggers a set of checks — security scans, builds, audits. Every single push. Not once a week before a release.
CD (Continuous Deployment) means if all those checks pass, your code goes to production automatically. No human clicking deploy.

The real value is not the automation itself — it is the gates. Without CI/CD, you are trusting yourself to manually check everything every time. That works until it does not. The day you are tired, rushing, or just distracted, something slips through.

DevSecOps takes this a step further by making security part of the pipeline itself. Security checks run on every push, blocking anything that does not meet the standard. Given that I am working in a regulated pharmaceutical environment at AbbVie where GxP compliance is part of daily life, this mindset clicked for me immediately. You do not check compliance once a quarter. You build it into the process.

Chapter 1: The Pipeline Architecture

Before I wrote a single line of YAML, I spent time thinking about how the jobs should depend on each other. This turned out to be one of the most valuable parts of the whole exercise.

Here is the structure I landed on:

```text {.ascii-diagram}
git push to master
│
├── Gitleaks (secret scanning) ┐
├── CodeQL (SAST analysis) ├── parallel
└── Dep Review (CVE scanning) ┘
│ all three must pass
▼
┌─────────────────────────────┐
│ Containerized Hugo build │
│ Docker · pinned version │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Lighthouse audit │
│ Performance ≥ 90 │
│ Accessibility ≥ 90 │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Deploy to GitHub Pages │
│ OIDC · no stored tokens │
└─────────────────────────────┘




The key decision is running the three security gates **in parallel**, not one after another. Gitleaks, CodeQL, and dependency review do not depend on each other — there is no reason to wait for Gitleaks to finish before starting CodeQL. Running them simultaneously means the whole security check phase takes as long as the slowest single scan, not the sum of all three.

The build only starts once all three pass. If any one of them fails, the whole pipeline stops there.

---

### Chapter 2: Security Gate 1 — Gitleaks Secret Scanning

Gitleaks scans your repository for accidentally committed secrets — API keys, tokens, passwords, private keys. This was one of those tools I had heard about but never actually used. Setting it up was straightforward. Understanding why one specific line matters took me longer.



```yaml
secret-scan:
  name: Gitleaks
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
    - uses: gitleaks/gitleaks-action@v2
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

That fetch-depth: 0 line is the important one. Without it, GitHub only checks out your latest commit. But git history keeps everything — if you committed a Cloudflare token six months ago and deleted it in the next commit, that token is still visible to anyone who clones your repo with git log. The fetch-depth: 0 tells Gitleaks to scan the entire history, not just the tip.

A scanner that only sees the latest commit is security theater. The full history scan is what makes it real.

If you have test files or documentation with example tokens that look like real secrets, add a .gitleaks.toml to your repo root to suppress false positives:

[allowlist]
description = "Known false positives"
regexes = [
  '''EXAMPLE_API_KEY''',
  '''test-token-placeholder'''
]
paths = [
  '''testdata/'''
]

Chapter 3: Security Gate 2 — CodeQL Static Analysis

CodeQL is GitHub's free SAST tool — Static Application Security Testing. Instead of looking for secrets, it reads your actual code and looks for patterns that could be exploited: XSS vulnerabilities, injection risks, insecure patterns in JavaScript.

codeql:
  name: CodeQL
  runs-on: ubuntu-latest
  permissions:
    security-events: write
    contents: read
  steps:
    - uses: actions/checkout@v4
    - uses: github/codeql-action/init@v3
      with:
        languages: javascript
    - uses: github/codeql-action/autobuild@v3
    - uses: github/codeql-action/analyze@v3

One thing worth noticing: security-events: write is declared at the job level, not globally. This is the principle of least privilege — each job only gets the permissions it actually needs. The global permissions block deliberately does not include this. Only CodeQL needs to write security events, so only CodeQL gets that permission.

Results appear in your repository's Security → Code scanning alerts tab. For a static Hugo portfolio, CodeQL will likely find nothing — which is the expected and correct result. The habit and the architecture are what matter.

Chapter 4: Security Gate 3 — Dependency Review

The dependency review action checks your packages against the GitHub Advisory Database on every pull request. If any dependency has a known CVE at high severity or above, the pipeline fails.

dependency-review:
  name: Dependency Review
  runs-on: ubuntu-latest
  if: github.event_name == 'pull_request'
  steps:
    - uses: actions/checkout@v4
    - uses: actions/dependency-review-action@v4
      with:
        fail-on-severity: high

The if: github.event_name == 'pull_request' condition is here because this action requires a base and head ref to compare — it's designed specifically for pull requests. On a direct push to master there's no base to compare against, so it would fail with a confusing error.

The correct engineering response is not to remove the job — it is to scope it to the right trigger. It runs on PRs, skips silently on direct pushes.

Because it can be skipped, the build job needs a special condition to prevent a cascade where a skipped job causes everything downstream to skip too:

build:
  needs: [secret-scan, codeql, dependency-review]
  if: ${{ !failure() && !cancelled() }}

This condition is applied to the build, Lighthouse, and deploy
jobs. Each one proceeds as long as nothing upstream actually
failed or was cancelled — a skipped dependency review on a
direct push does not block the rest of the pipeline.

Chapter 5: The Containerized Build — And Where I Got Stuck

This is the stage where things got interesting. And by interesting, I mean frustrating for a while.

- name: Build with Docker
  run: |
    docker run --rm \
      --user $(id -u):$(id -g) \
      -v ${{ github.workspace }}:/src \
      -w /src \
      floryn90/hugo:0.120.4-ext-alpine \
      --minify --gc

The first few runs of the pipeline kept failing at the build stage. The error was not obvious — it was a permissions issue. Hugo was running as root inside the Docker container, writing the public/ directory with root ownership. The next pipeline step could not read those files.

The fix is the --user $(id -u):$(id -g) flag. This tells Docker to run the Hugo process as the current runner user instead of root, which means the output files are owned by the right user and everything downstream can read them cleanly.

It is not something you would find in a basic Docker tutorial. You find it by having the pipeline break and digging into why.

The other decision worth explaining: floryn90/hugo:0.120.4-ext-alpine with a pinned version, not latest. GitHub's runners update their tool versions regularly. If Hugo releases a breaking change and the runner silently upgrades, your build breaks with no obvious reason. Pinning to 0.120.4 means that exact version runs every time, regardless of what the runner has.

After the build, two separate artifacts get uploaded:

# Uncompressed HTML for Lighthouse to audit
- name: Upload artifact for Lighthouse
  uses: actions/upload-artifact@v4
  with:
    name: public-site
    path: public/

# Compressed tarball for GitHub Pages deploy
- name: Upload artifact for Pages
  uses: actions/upload-pages-artifact@v3
  with:
    path: public/

These are different formats. actions/deploy-pages requires its own specific artifact format from upload-pages-artifact. You cannot reuse the same artifact for both — something I discovered when the deploy step failed because it could not find the right artifact format.

Chapter 6: The Lighthouse Audit

Before anything reaches production, Lighthouse audits the built site against enforced thresholds. If scores drop below the minimums, the deploy is blocked.

lighthouse:
  name: Lighthouse Audit
  runs-on: ubuntu-latest
  needs: [build]

  steps:
    - uses: actions/checkout@v4
    - name: Download artifact
      uses: actions/download-artifact@v4
      if: ${{ !failure() && !cancelled() }}
      with:
        name: public-site
        path: public/

    # SRE fix: remove non-content files before audit
    - name: Prune non-content files from audit
      run: rm -f public/google*.html public/404.html

    - name: Serve & audit
      uses: treosh/lighthouse-ci-action@v11
      with:
        uploadArtifacts: true
        temporaryPublicStorage: true
        configPath: .lighthouserc.json

The file pruning step was another real discovery. Google Search Console verification files and custom 404 pages were causing Lighthouse to audit them as content pages and fail on them. They are not real content — removing them before the audit prevents false failures on files I do not control. It is a small fix that took some head-scratching to figure out.

The .lighthouserc.json configuration:

{
  "ci": {
    "collect": {
      "staticDistDir": "./public",
      "numberOfRuns": 2
    },
    "upload": {
      "target": "temporary-public-storage"
    },
    "assert": {
      "assertions": {
        "categories:performance": ["error", { "minScore": 0.9 }],
        "categories:accessibility": ["error", { "minScore": 0.9 }],
        "categories:best-practices": ["warn", { "minScore": 0.9 }],
        "categories:seo": ["warn", { "minScore": 0.9 }]
      }
    }
  }
}

Running twice and averaging reduces the chance of a single slow network moment on GitHub's shared runners falsely blocking a legitimate deploy. Performance and accessibility are hard errors — below 90 blocks the deploy. Best practices and SEO are warnings — tracked but not blocking.

Chapter 7: Secretless Deployment with OIDC

The deploy stage was the one I was most curious about going in. Most tutorials tell you to generate an API token, store it in GitHub Secrets, and use it on every deploy. That works, but it means there is a long-lived credential sitting in your secrets that stays valid until you manually rotate it.

OIDC is different. Instead of a stored token, GitHub Actions generates a short-lived cryptographic proof of identity at runtime. It proves who it is, completes the deploy, and the token expires minutes later. There is nothing to store, nothing to rotate, and nothing to leak.

deploy:
  name: Deploy to GitHub Pages
  runs-on: ubuntu-latest
  needs: [build, lighthouse]
  if: ${{ !failure() && !cancelled() }}
  environment:
    name: github-pages
    url: ${{ steps.deployment.outputs.page_url }}
  steps:
    - name: Deploy to GitHub Pages
      id: deployment
      uses: actions/deploy-pages@v4

The OIDC capability comes from the global permissions block:

permissions:
  contents: read
  pages: write
  id-token: write # enables OIDC

That id-token: write line is what enables the handshake. Without it, GitHub Actions cannot request the short-lived identity token and the deploy fails.

Chapter 8: The Moment It All Worked

I am not going to pretend the first run was clean. There were multiple failed runs — the permissions error on the Docker build, the artifact format mismatch, the Lighthouse false failures on the verification files. Each one took some digging to understand and fix.

But when I finally pushed a commit and watched all five jobs turn green in the GitHub Actions tab — Gitleaks, CodeQL, Build Hugo, Lighthouse Audit, Deploy to GitHub Pages, all green — it felt genuinely good. Not just because it worked, but because I actually understood why each piece was there and what it was doing.

What surprised me most was how well everything worked together once it was wired up correctly. GitHub Actions triggering the pipeline, GitHub Pages serving the site, Cloudflare picking it up for DNS and CDN — the whole chain from git push to live site update was faster and more seamless than I expected. The integration between these tools is genuinely impressive.

Branch Protection — Locking It In

The pipeline means nothing if you can bypass it by pushing directly to master without any checks running. In your GitHub repository go to Settings → Branches → Add branch protection rule for master:

✅ Require status checks to pass before merging
- Add: Gitleaks, CodeQL, Build Hugo, Lighthouse Audit
✅ Require branches to be up to date before merging
✅ Do not allow bypassing the above settings

Now the pipeline is the only path to production. Not even the repo owner can push directly to master and skip it.

The Complete Workflow File

The full .github/workflows/deploy.yml:

name: DevSecOps CI/CD Pipeline

on:
  push:
    branches: [master]
  pull_request:
    branches: [master]

permissions:
  contents: read
  pages: write
  id-token: write

jobs:
  secret-scan:
    name: Gitleaks
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  codeql:
    name: CodeQL
    runs-on: ubuntu-latest
    permissions:
      security-events: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: github/codeql-action/init@v3
        with:
          languages: javascript
      - uses: github/codeql-action/autobuild@v3
      - uses: github/codeql-action/analyze@v3

  dependency-review:
    name: Dependency Review
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: actions/dependency-review-action@v4
        with:
          fail-on-severity: high

  build:
    name: Build Hugo
    runs-on: ubuntu-latest
    needs: [secret-scan, codeql, dependency-review]
    if: ${{ !failure() && !cancelled() }}
    steps:
      - uses: actions/checkout@v4
        with:
          submodules: true
      - name: Build with Docker
        run: |
          docker run --rm \
            --user $(id -u):$(id -g) \
            -v ${{ github.workspace }}:/src \
            -w /src \
            floryn90/hugo:0.120.4-ext-alpine \
            --minify --gc
      - name: Upload artifact for Lighthouse
        uses: actions/upload-artifact@v4
        with:
          name: public-site
          path: public/
      - name: Upload artifact for Pages
        uses: actions/upload-pages-artifact@v3
        with:
          path: public/

  lighthouse:
    name: Lighthouse Audit
    runs-on: ubuntu-latest
    needs: [build]
    if: ${{ !failure() && !cancelled() }}
    steps:
      - uses: actions/checkout@v4
      - name: Download artifact
        uses: actions/download-artifact@v4
        with:
          name: public-site
          path: public/
      - name: Prune non-content files from audit
        run: rm -f public/google*.html public/404.html
      - name: Serve & audit
        uses: treosh/lighthouse-ci-action@v11
        with:
          uploadArtifacts: true
          temporaryPublicStorage: true
          configPath: .lighthouserc.json

  deploy:
    name: Deploy to GitHub Pages
    runs-on: ubuntu-latest
    needs: [build, lighthouse]
    if: ${{ !failure() && !cancelled() }}
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

What I Took Away From This

Building this taught me that CI/CD pipelines are not magic — they are just a series of jobs with dependencies between them, each one doing one specific thing. Once you understand the dependency graph, the YAML almost writes itself.

The parts that actually taught me the most were not the parts that worked. They were the failed runs — figuring out why the Docker build was writing files as root, why Lighthouse was failing on a Google verification file, why the artifact format mattered. Those failures forced me to actually understand what was happening rather than just accepting that it worked.

Tools like GitHub Actions, Gitleaks, CodeQL, and Lighthouse are not intimidating once you have broken them and fixed them yourself. That is the thing about learning by building — the failures are the lesson.

What is Next?

In Part 2 of this series, we will bring the infrastructure itself under version control. All 12 Cloudflare DNS records, the Zero Trust tunnel configuration, and the Oracle Cloud instance — managed by Terraform, with state stored in AWS S3, and changes deployed through the same pipeline we built here.

The same discipline applied to code delivery, applied to the servers themselves.

Stay tuned, and happy building!

Appendix: The `.lighthouserc.json`

{
  "ci": {
    "collect": {
      "staticDistDir": "./public",
      "numberOfRuns": 2
    },
    "upload": {
      "target": "temporary-public-storage"
    },
    "assert": {
      "assertions": {
        "categories:performance": ["error", { "minScore": 0.9 }],
        "categories:accessibility": ["error", { "minScore": 0.9 }],
        "categories:best-practices": ["warn", { "minScore": 0.9 }],
        "categories:seo": ["warn", { "minScore": 0.9 }]
      }
    }
  }
}

Building an LLM-Powered Log Triage Pipeline with Python and DeepSeek-R1

Prajwol Adhikari — Sun, 10 May 2026 00:53:54 +0000

Introduction

I have Prometheus and Grafana monitoring my homelab. I have Alertmanager sending Discord notifications when a node goes down. But there was a gap in the middle that kept bugging me.

Prometheus tells me that something is wrong. CPU is high. A container restarted. A scrape target is unreachable. What it does not tell me is why. For that, you need to read the logs. And reading Docker logs across multiple containers, multiple times a day, is the kind of task that feels productive for about ten minutes before you start skimming and missing things.

So I built something to read them for me. A Python script that runs every 15 minutes, pulls Docker container logs, checks for anything that looks critical, and sends the critical stuff to a small language model running on my Oracle Cloud instance. The model reads the raw log entry and writes a plain-English summary. That summary gets posted to a Discord channel.

Instead of me reading through hundreds of log lines and hoping I notice the important one, an LLM reads them and only bothers me when something actually matters.

This is not a fancy AI agent with tool use and multi-step reasoning. It is a straightforward automation — rules-based triage plus an LLM for summarization. But it solves a real problem I was actually having, and it taught me a lot about how to practically integrate an LLM into an infrastructure workflow.

Why not just use Alertmanager for everything?

Fair question. Alertmanager handles the metrics side well — if CPU spikes above 90% for five minutes, or if a node goes unreachable, it fires an alert. But metrics and logs are different things.

A container can be running fine from a metrics perspective — CPU normal, memory stable, responding to health checks — but still be logging errors internally. Maybe it is failing to connect to an upstream API. Maybe it is retrying a database connection every 30 seconds. Maybe there is a deprecation warning that will become a breaking change next release. None of that shows up in Prometheus metrics. All of it shows up in logs.

The log triage pipeline covers the gap between "the container is running" and "the container is healthy."

Chapter 1: The Architecture

The pipeline has four components spread across two machines:

On my local server (Waco, Texas):

The Python script that reads Docker logs and classifies severity
A cron job that runs the script every 15 minutes
Docker, whose containers produce the logs On the Oracle Cloud instance (Phoenix, Arizona):
Ollama, serving the DeepSeek-R1 1.5B model as a REST API In between:
Tailscale, connecting both machines over an encrypted mesh VPN
Discord webhooks, receiving the final alert messages The separation is intentional. The LLM runs on the Oracle instance because it has 24GB of RAM — enough to load a small model comfortably. My local server has less headroom, and I did not want model inference competing with the Docker services it is supposed to be monitoring.

The Python script calls the Ollama API over Tailscale, so the traffic never touches the public internet. The model endpoint is not exposed to anyone outside my Tailscale network.

Chapter 2: Setting Up Ollama and DeepSeek-R1

Ollama makes self-hosting a language model surprisingly painless. On the Oracle instance, the setup was:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-r1:1.5b

That is it. Ollama downloads the model and serves it as a REST API on port 11434. You can test it immediately:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:1.5b",
  "prompt": "Summarize this log entry: ERROR: database connection refused at 10.0.0.5:5432, retrying in 30s",
  "stream": false
}'

And it responds with a natural-language summary of what the log entry means.

I chose the 1.5B parameter model for a reason. It is small enough to run on the Oracle ARM instance without maxing out memory, and fast enough that inference takes a few seconds per log entry rather than minutes. For summarizing log lines, you do not need GPT-4 level intelligence. You need something that can read a stack trace and say "the database connection is failing" in plain English. The 1.5B model does that reliably.

A larger model would produce slightly more polished summaries, but the latency and memory tradeoff is not worth it for an automation that runs every 15 minutes. I would rather have fast and good enough than slow and perfect.

Chapter 3: The Python Script — Rules First, LLM Second

This is where the design decision that matters most lives. The script does not send every log line to the LLM. That would be slow, expensive on compute, and pointless — most log lines are routine. Instead, it uses a two-stage approach:

Stage 1: Rules-based severity classification. The script reads the last 15 minutes of logs from each Docker container using docker logs --since 15m. It then checks each line against a set of keyword patterns:

Lines containing error, fatal, critical, OOM, killed, panic, exception → classified as critical
Lines containing warn, timeout, retry, refused → classified as warning
Everything else → ignored This is intentionally simple. I am not trying to build a perfect classifier. I am trying to filter out the 95% of log lines that say things like "request completed in 12ms" so the LLM only has to deal with the 5% that might actually matter.

Stage 2: LLM summarization. Only the lines classified as critical get sent to DeepSeek. The prompt is straightforward:

prompt = f"""You are a DevOps engineer reviewing system logs.
Summarize the following log entry in one or two sentences.
Explain what happened and whether immediate action is needed.

Log entry:
{log_line}"""

The model returns a summary like: "The Grafana container failed to authenticate with its PostgreSQL backend. The connection was refused, suggesting the database container may be down or the credentials have changed. Immediate investigation recommended."

That summary is what gets posted to Discord — not the raw log line, but the plain-English interpretation of it.

Chapter 4: The Discord Integration

Discord webhooks are probably the simplest notification integration you can set up. You create a webhook URL in your Discord server settings, and then posting to it is one HTTP request:

import requests

def send_discord_alert(summary, container_name, severity):
    webhook_url = "your-discord-webhook-url"

    payload = {
        "embeds": [{
            "title": f"🔴 {severity.upper()} — {container_name}",
            "description": summary,
            "color": 15158332  # red
        }]
    }

    requests.post(webhook_url, json=payload)

The embed format gives you a clean, colored card in Discord rather than a wall of text. Critical alerts show up in red. Warnings could show up in yellow if I ever decide to surface those too — for now I only send critical ones to keep the noise low.

The webhook URL is stored as an environment variable, not hardcoded. I learned this the hard way earlier in the project when I accidentally shared webhook URLs in a chat and had to regenerate them. Treat webhook URLs like API keys — anyone with the URL can post to your channel.

Chapter 5: The Cron Job

The script runs every 15 minutes via cron on my local server:

*/15 * * * * /usr/bin/python3 /home/user/scripts/log-triage.py >> /var/log/log-triage.log 2>&1

Fifteen minutes is a balance between responsiveness and noise. Every 5 minutes would catch things faster but generate more Discord traffic during noisy periods (like when I am actively deploying something and containers are restarting). Every hour would miss things for too long. Fifteen minutes means I find out about a critical issue within fifteen minutes — which for a homelab is perfectly fine.

The output gets appended to its own log file, which is a bit meta — the log triage tool has its own logs. But it is useful for debugging when the script itself fails, which happened more than once during development.

Chapter 6: What I Learned Building This

The rules-based first stage is doing most of the work. I originally planned to send all logs to the LLM and let it figure out what was important. That was a mistake. The model was slow, the responses were inconsistent for routine log lines, and the Discord channel was flooded with summaries of perfectly normal events. Adding the keyword filter in front cut the LLM calls by about 95% and made the whole pipeline actually useful.

This is a pattern I have seen in every discussion about production LLM systems: you almost always want a cheap, fast filter in front of the expensive, slow model. Let the simple rules handle the simple cases. Only escalate to the LLM when something actually needs interpretation.

Small models are fine for specific tasks. There is a temptation to reach for the biggest model you can run. But for log summarization, the 1.5B parameter model produces perfectly adequate output. It occasionally misses nuance that a larger model would catch, but the summaries are accurate enough to tell me whether I need to investigate further. For an alerting pipeline, "accurate enough to trigger investigation" is the right bar — not "perfect analysis."

Self-hosting has real advantages for this use case. I could have called an external API like OpenAI or Anthropic instead of running my own model. But there are three reasons I did not:

Cost — at 96 runs per day, even cheap API calls add up over months. The Oracle instance is free tier.
Privacy — I am sending my infrastructure logs to the model. Even in a homelab, I would rather not send container logs to a third-party API.
Latency — the Ollama instance responds in 2-3 seconds over Tailscale. An API call over the internet would be similar, but with more variable latency and the possibility of rate limiting. This is not an AI agent. I want to be clear about what this is and what it is not. An agent makes decisions and takes actions — it might read a log, decide the database needs restarting, and execute the restart. This pipeline does not do that. It reads logs, summarizes them, and tells me about them. I am still the one who decides what to do. That is a deliberate choice — I am not comfortable with automated remediation on infrastructure I actually depend on. Maybe in a future iteration.

What Could Be Better

There are obvious improvements I have not made yet:

Smarter classification. The keyword matching is crude. "Error" in a log line is not always an error — sometimes it is a log line about error handling working correctly, like "recovered from error successfully." A more sophisticated approach would use regex patterns tuned per container, or even a small classifier model. For now, the false positive rate is low enough that I live with it.

Log aggregation with Loki. Right now, the script runs docker logs on each container individually. If I set up Grafana Loki, all container logs would flow into a central store, and the script could query Loki instead of Docker directly. That is a cleaner architecture and it is on my roadmap for a future phase.

Alert deduplication. If a container logs the same error repeatedly (like a connection retry every 30 seconds), the script will send the same alert multiple times. I should add a simple cache that tracks recently seen errors and suppresses duplicates within a time window.

The Monitoring Stack So Far

This pipeline sits alongside the rest of the observability stack I have been building across the hybrid cloud project:

Prometheus scrapes system metrics (CPU, memory, disk, network) from three geographically distributed nodes — my local server in Texas, Oracle Cloud in Arizona, and a shell server in the Netherlands.
Grafana visualizes those metrics on dashboards.
Alertmanager fires alerts to Discord when metric-based rules trigger (like a node going unreachable).
This Python pipeline covers the log side — reading container logs, summarizing critical entries with DeepSeek, and posting summaries to Discord. Together, they give me visibility into both the system-level health (metrics) and the application-level behavior (logs) of the homelab. Not bad for infrastructure running on a laptop and a free-tier cloud instance.

Appendix: The Complete Script

Here is a cleaned-up version of the script. Replace the placeholder values with your own container names, Ollama endpoint, and Discord webhook URL.

#!/usr/bin/env python3
"""
LLM-Augmented Log Triage Pipeline
Rules-based severity classification + DeepSeek-R1 summarization.
Runs via cron every 15 minutes.
"""
import subprocess
import requests
import json
import os
from datetime import datetime

# ── Configuration ──────────────────────────────────────────────
DISCORD_WEBHOOK_URL = os.environ.get("DISCORD_WEBHOOK_URL")
if not DISCORD_WEBHOOK_URL:
    raise ValueError("DISCORD_WEBHOOK_URL not set")

OLLAMA_URL   = "http://<your-ollama-host>:11434/api/generate"
OLLAMA_MODEL = "deepseek-r1:1.5b"

# Containers to monitor — adjust to match your Docker stack
CONTAINERS = [
    "prometheus",
    "grafana",
    "alertmanager",
    "nginx-proxy",
    "adguard",
]

# ── Stage 1: Rules-based triage ────────────────────────────────

# Keywords that trigger LLM analysis
ESCALATE_KEYWORDS = [
    "fatal", "panic", "oom", "killed", "out of memory",
    "disk full", "no space left", "corruption", "segfault",
    "exception", "unauthorized", "authentication failed",
    "permission denied", "container exited",
    "exit code 1", "exit code 2",
]

# Known-harmless patterns to ignore before keyword matching
IGNORE_PATTERNS = [
    "filter update",          # adguard routine
    "nginx reloaded",         # proxy routine
    "certificate renewed",    # TLS renewal noise
    "checkpoint",             # prometheus WAL compaction
    "compacted",              # prometheus normal
    "watching for new ooms",  # cadvisor startup
]


def get_container_logs(container, lines=30):
    """Pull the last N lines of logs from a Docker container."""
    try:
        result = subprocess.run(
            ["docker", "logs", "--tail", str(lines), container],
            capture_output=True, text=True, timeout=10
        )
        output = (result.stdout + result.stderr).strip()
        return output[:1500] if output else "No output."
    except Exception as e:
        return "Error: " + str(e)


def should_analyze(logs):
    """
    Rules-based filter. Strips known-harmless patterns first,
    then checks for escalation keywords.
    Returns (needs_analysis: bool, matched_keyword: str or None).
    """
    logs_lower = logs.lower()

    for pattern in IGNORE_PATTERNS:
        if pattern in logs_lower:
            logs_lower = logs_lower.replace(pattern, "")

    for keyword in ESCALATE_KEYWORDS:
        if keyword in logs_lower:
            return True, keyword

    return False, None


# ── Stage 2: LLM summarization ────────────────────────────────

def analyze_with_ai(container, logs, trigger_keyword):
    """Send critical logs to DeepSeek for plain-English summarization."""
    prompt = (
        "You are an SRE. A Docker container triggered an alert.\n\n"
        f"Container: {container}\n"
        f"Trigger keyword found: {trigger_keyword}\n\n"
        f"Logs:\n{logs}\n\n"
        "Explain in 2-3 sentences:\n"
        "1. What is the actual problem?\n"
        "2. How severe is it: critical or warning?\n"
        "3. What should the engineer do?\n"
    )

    try:
        resp = requests.post(
            OLLAMA_URL,
            json={
                "model": OLLAMA_MODEL,
                "prompt": prompt,
                "stream": False,
                "options": {
                    "temperature": 0.1,
                    "num_predict": 1000,
                    "num_ctx": 1024,
                }
            },
            timeout=300
        )
        resp.raise_for_status()
        raw = resp.json().get("response", "").strip()

        # DeepSeek-R1 wraps reasoning in <think> tags — strip them
        if "<think>" in raw:
            raw = raw.split("</think>")[-1].strip()

        # Determine severity from the model's response
        raw_lower = raw.lower()
        severity = "warning"
        if "critical" in raw_lower and "not critical" not in raw_lower:
            severity = "critical"

        return {"analysis": raw, "severity": severity}

    except Exception as e:
        return {"analysis": "AI analysis failed: " + str(e), "severity": "warning"}


# ── Discord alerting ───────────────────────────────────────────

def send_discord_alert(container, trigger_keyword, analysis_result):
    """Post a formatted embed to Discord with the LLM summary."""
    severity = analysis_result.get("severity", "warning")
    colors = {"critical": 0xF85149, "warning": 0xE3B341}

    payload = {
        "embeds": [{
            "title": f"Alert — {container}",
            "color": colors.get(severity, 0xE3B341),
            "fields": [
                {"name": "Container",       "value": f"`{container}`",       "inline": True},
                {"name": "Severity",        "value": severity.upper(),       "inline": True},
                {"name": "Trigger keyword", "value": f"`{trigger_keyword}`", "inline": False},
                {"name": "AI Analysis",     "value": analysis_result.get("analysis", ""), "inline": False},
                {"name": "Time",            "value": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "inline": False},
            ],
            "footer": {"text": "Rules triage + DeepSeek-R1 1.5B"}
        }]
    }

    try:
        requests.post(DISCORD_WEBHOOK_URL, json=payload, timeout=5)
    except Exception as e:
        print(f"Discord failed: {e}")


# ── Main loop ──────────────────────────────────────────────────

def main():
    print(f"\n[{datetime.now().strftime('%H:%M:%S')}] Log triage starting...")
    escalated = 0

    for container in CONTAINERS:
        logs = get_container_logs(container)
        needs_analysis, keyword = should_analyze(logs)

        if not needs_analysis:
            continue

        result = analyze_with_ai(container, logs, keyword)
        send_discord_alert(container, keyword, result)
        escalated += 1

    print(f"Done. {escalated}/{len(CONTAINERS)} containers escalated.")


if __name__ == "__main__":
    main()

To run it on a 15-minute schedule, add a cron job:

crontab -e

*/15 * * * * DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/your-webhook-here" /usr/bin/python3 /path/to/log-triage.py >> /var/log/log-triage.log 2>&1

What is Next

The hybrid cloud series continues with Part 3: K3s Kubernetes Cluster — setting up a K3s cluster with my local server as the control plane and the Oracle Cloud instance as a worker node, connected over Tailscale. Once that is running, I plan to containerize this log triage pipeline itself and deploy it as a Kubernetes workload, shipped through the CI/CD pipeline I built in Part 1. That would close the loop — the monitoring tool running inside the system it monitors, delivered through the same pipeline as everything else.

Stay tuned, and happy building.

Part 5: Securing a Homelab with Cloudflare Tunnels and Zero Trust

Prajwol Adhikari — Sun, 10 May 2026 00:51:09 +0000

Introduction

Welcome to Part 5 of this homelab series! In the previous parts, we've built a Debian server, deployed a full suite of services with Docker, and set up a high-availability DNS network. But one critical piece is still missing: end-to-end security.

Until now, we've been accessing local services via http://grafana.local, which browsers correctly flag as "Not Secure." The common solution is to open ports 80 and 443 on our router, but that exposes our server and home network to the entire internet—a huge security risk.

In this guide, we'll walk through the ultimate solution: using a Cloudflare Tunnel and a public domain to get 100% free, valid HTTPS certificates for all internal services, all with zero open ports on the router. We'll also lock everything down behind Cloudflare's Zero Trust platform, so only authorized users can access them.

Chapter 1: The Domain Advantage

To make this work, a public domain (e.g., your-domain.com) is required.

A quick tip for Nepali citizens: You can register a .com.np domain for free for life, which is an incredible resource for projects like this.

The goal is to create secure, public-facing subdomains for our private services (like grafana.your-domain.com) without actually exposing our server.

Chapter 2: Setting Up the Cloudflare Tunnel

A Cloudflare Tunnel is a secure, outbound-only connection from a connector (a small piece of software) running on our server to the Cloudflare network. This means no inbound ports are needed.

Create a Zero Trust Account: First, log into the Cloudflare dashboard, go to the Zero Trust menu, and sign up for the free plan. You will be asked to choose a "team name" (e.g., my-lab), which creates a unique login URL for your account.
Create the Tunnel: In the Zero Trust dashboard, navigate to Networks > Tunnels and click "Create a tunnel".
Choose "Cloudflared" as the connector type and give the tunnel a name, like homelab-debian.
Get the Token: Cloudflare then presents options for installing the connector. Select Docker, which provides a docker run command containing a unique, secret token.

Chapter 3: Deploying the `cloudflared` Connector

Instead of just running the docker run command, using docker-compose.yml is much better for long-term management.

Create a new directory on the server:

mkdir -p ~/docker/cloudflared
cd ~/docker/cloudflared
nano docker-compose.yml

Paste in the following configuration, using the token from the Cloudflare dashboard. It's critical to connect this container to the npm_default network created in Part 2.

services:
  cloudflared:
    image: cloudflare/cloudflared:latest
    container_name: cloudflared-tunnel
    restart: unless-stopped
    command: tunnel --no-autoupdate run --token <YOUR_TOKEN_HERE>
    networks:
      - npm_default

networks:
  npm_default:
    external: true

Launch the container: docker compose up -d
Back in the Cloudflare dashboard, the "Connectors" section for the tunnel should now show a "Healthy" status.

Chapter 4: Routing Traffic - Cloudflare to NPM

Now, we need to tell the tunnel where to send incoming traffic. The goal is to send all traffic for our subdomains to one place: Nginx Proxy Manager (NPM).

In the tunnel's configuration, go to the "Published application routes" tab.
Click "Add a published application routes" and create an entry for each of your services. For Grafana:
- Subdomain: grafana
- Domain: your-domain.com
- Service Type: HTTP
- URL: http://npm-app-1:80 (This is the container name and internal port for NPM)
Click "Save" and repeat this for all other services (homer, prometheus, etc.). This process automatically creates the public CNAME records in Cloudflare's DNS panel.

Chapter 5: The "Split-Brain DNS" Setup

This setup ensures our new domains work perfectly both inside and outside our home network.

Publicly (Away from Home): This is already done. When a device is on cellular data, it uses public DNS, finds the Cloudflare CNAME, and is securely sent through the tunnel.
Locally (At Home): When at home, we don't want traffic going out to the internet and back. We use AdGuard Home to create DNS rewrites.
1. In the AdGuard Home dashboard, go to Filters > DNS Rewrites.
2. Add a new rule:
  - Domain: grafana.your-domain.com
  - Answer: 192.168.1.100 (The local IP of your NPM server)
3. Repeat this for homer.your-domain.com, prometheus.your-domain.com, etc.

Chapter 6: The "Split-Brain" SSL Fix (Local & Public)

This setup will create two different SSL errors: ERR_SSL_UNRECOGNIZED_NAME_ALERT when at home, and ERR_TOO_MANY_REDIRECTS when on a public network.

Here is the step-by-step solution that fixes both problems permanently.

Step 1: Get a Cloudflare API Token

NPM needs a way to automatically prove to Cloudflare that it owns the domain.

In the Cloudflare profile, go to API Tokens > Create Token.
Use the "Edit zone DNS" template.
Set Zone Resources to Include > Specific zone > your-domain.com.
Click "Continue" and "Create Token". Copy the generated token immediately.

Step 2: Configure NPM Proxy Host

In the Nginx Proxy Manager admin panel, edit the proxy host for grafana.your-domain.com.
On the Details Tab, make sure the Forward Hostname is grafana and the Forward Port is 3000.
Go to the SSL Tab:
- For SSL Certificate, choose "Request a new SSL Certificate".
- Toggle "Use a DNS Challenge" to ON.
- Click "Add a new credential", select Cloudflare, and paste in your API token.
- CRITICALLY: Toggle "Force SSL" to OFF. This is what prevents the redirect loop.
Click Save. NPM will now use the API token to get a valid Let's Encrypt certificate.

Step 3: Configure Cloudflare SSL

In the main Cloudflare dashboard, for your domain go to SSL/TLS > Overview.
Set the SSL/TLS encryption mode to "Full (Strict)".

This combination is the perfect solution:

Locally: The browser connects directly to NPM (thanks to the AdGuard rewrite) and is served the valid Let's Encrypt certificate, fixing the ERR_SSL_UNRECOGNIZED_NAME_ALERT.
Publicly: Cloudflare enforces HTTPS (Full Strict). The request goes to NPM, which (with "Force SSL" off) no longer tries to redirect, fixing the ERR_TOO_MANY_REDIRECTS loop.

Chapter 7: The Final Layer - Cloudflare Access (Zero Trust)

Our services are now accessible, but they are public. This final step secures them.

In the Cloudflare Zero Trust dashboard, go to Access > Applications and click "Add an application".
Choose "Self-hosted".
Add all the new public hostnames (grafana.your-domain.com, homer.your-domain.com, etc.) to the "Public hostname" section.
On the next page, create one simple policy:
- Policy name: Allow-Admin-Only
- Action: Allow
- Rule: Include, Emails, your-email@example.com
Click "Save application".

Conclusion: From Bare Metal to a Secure, Global Homelab

And with that, we've placed the final and most important piece of our puzzle: professional-grade security.

Let's step back and appreciate what we've built. On our home network, we have seamless, direct access to our services with valid SSL certificates. The moment we step outside our home, our lab becomes a secure fortress. Our services are completely invisible to the public internet, hidden behind Cloudflare's robust Zero Trust authentication.

We have successfully achieved the gold standard of modern, secure infrastructure:

Zero-Trust Access: Only authenticated users can even see our login pages.
A "Closed-Port" Firewall: We've done it all without opening a single port on our router, eliminating one of the single greatest security risks for any homelab.
Global Accessibility: We can securely access our tools from anywhere in the world, just like a professional enterprise service.

Looking Back: Our 5-Part Journey

It's amazing to see how far we've come. In this series, we started with nothing but a powered-off machine and an idea.

In Part 1, we built our server from the ground up with a minimal Debian install utilizing our old laptop.
In Part 2, we hardened its security with SSH keys, UFW, and Fail2Ban.
In Part 3, we unleashed its potential with Docker and deployed our first service, AdGuard Home.
In Part 4, we solved internal networking with a local DNS server for clean, "at-home" SSL.
And finally, in Part 5, we've secured it for the entire world with Cloudflare Tunnels.

We have successfully built a stable, secure, and powerful foundation. This server is no longer just a project; it's a platform ready to host any idea we can dream up.

What's Next?

Thank you so much for following along on this journey. I hope this guide has been valuable and has empowered you to build your own private corner of the internet.

For now, this concludes our setup series. We've built the "house" and secured it. The next logical step, and the subject for a whole new series, is to learn how to manage it. What if we want to deploy 10 services? What if this server fails and we need to rebuild it in minutes, not hours?

Our foundation is set. The next adventure will be in the world of automation and orchestration, using powerful tools like Ansible (Infrastructure as Code) to automate our setup and Kubernetes (K3s) to manage our containerized applications at scale.

Stay tuned, and happy building!

Part 4: Automating a Homelab with Backups, Updates, and Alerts

Prajwol Adhikari — Sun, 10 May 2026 00:48:25 +0000

Introduction

Welcome to the part 4 of the homelab series! In the previous parts, we built a server, deployed a suite of services, and configured our network. Now, it's time to make it resilient and self-maintaining. A homelab isn't just about setting things up; it's about keeping them running reliably.

This guide will show you how to set up the three pillars of modern IT operations: Automated Backups, Automated Updates, and Proactive Alerting. By the end, you'll have a homelab that runs itself, ensures your data is safe, stays up-to-date, and notifies you when something goes wrong.

Chapter 1: The Automated Backup Strategy (at 3 AM)

A solid backup strategy is non-negotiable. I implemented a robust system inspired by the "3-2-1" rule, focusing on redundancy and an off-site copy. My strategy involves maintaining two copies of my data in two separate locations: one local backup on the server itself for fast recovery, and one automated, off-site backup to Google Drive to protect against a local disaster like a fire or hardware failure.

This script runs at 3 AM, creates a local backup, uploads it, and then notifies Discord.

Step 1: Configure `rclone` for Google Drive

First, you need a tool to communicate with Google Drive. We'll use rclone.

Install rclone on your Debian server:

sudo -v ; curl https://rclone.org/install.sh | sudo bash

Run the interactive setup:
```
rclone config
```
Follow the Prompts:
- n (New remote) *
- name>: gdrive (You can name it anything)
- storage>: Find and select drive (Google Drive).
- client_id> & client_secret>: Press Enter for both to leave blank.
- scope>: Choose 1 (Full access).
- Use auto config? y/n>: This is a critical step. Since we are on a headless server, type n and press Enter.
Authorize Headless:
- rclone will give you a command to run on a machine with a web browser (like your main computer).
- On your main computer (where you have rclone installed), run the rclone authorize "drive" "..." command.
- This will open your browser, ask you to log in to Google, and grant permission.
- Your main computer's terminal will then output a block of text (your config_token).
Paste Token: Copy the token from your main computer and paste it back into your server's rclone prompt.
Finish the prompts, and your connection is complete.

Step 2: Create the Backup Script

Next, create a shell script to perform the backup.

Create the file and make it executable:
```
nano ~/backup.sh
chmod +x ~/backup.sh
```

Paste in the following script. You must edit the first 7 variables to match your setup.

#!/bin/bash

# --- Configuration ---
SOURCE_DIR="/path/to/your/docker"  # <-- Change to your Docker projects directory
BACKUP_DIR="/path/to/your/backups"  # <-- Change to your backups folder
FILENAME="homelab-backup-$(date +%Y-%m-%d).tar.gz"
LOCAL_RETENTION_DAYS=3
CLOUD_RETENTION_DAYS=3
RCLONE_REMOTE="gdrive"  # <-- Must match your rclone remote name
RCLONE_DEST="Homelab Backups"  # <-- Folder name in Google Drive

# --- "https://discordapp.com/api/webhooks/141949178941/6Tx6f1yjf26LztQ" ---
DISCORD_WEBHOOK_URL="YOUR_DISCORD_WEBHOOK_URL"

# --- Notification Function ---
send_notification() {
    MESSAGE=$1
    curl -H "Content-Type: application/json" -X POST -d "{\"content\": \"$MESSAGE\"}" "$DISCORD_WEBHOOK_URL"
}

# --- Script Logic ---
echo "--- Starting Homelab Backup: $(date) ---"
send_notification "✅ Starting Homelab Backup..."

# 1. Create local backup
echo "Creating local backup..."
tar -czf "${BACKUP_DIR}/${FILENAME}" -C "${SOURCE_DIR}" .
echo "Local backup created at ${BACKUP_DIR}/${FILENAME}"

# 2. Upload to Google Drive
echo "Uploading backup to ${RCLONE_REMOTE}..."
rclone copy "${BACKUP_DIR}/${FILENAME}" "${RCLONE_REMOTE}:${RCLONE_DEST}"
echo "Upload complete."

# 3. Clean up local backups
echo "Cleaning up local backups older than ${LOCAL_RETENTION_DAYS} days..."
find "${BACKUP_DIR}" -type f -name "*.tar.gz" -mtime +${LOCAL_RETENTION_DAYS} -delete
echo "Local cleanup complete."

# 4. Clean up cloud backups
echo "Cleaning up cloud backups older than ${CLOUD_RETENTION_DAYS} days..."
rclone delete "${RCLONE_REMOTE}:${RCLONE_DEST}" --min-age ${CLOUD_RETENTION_DAYS}d
echo "Cloud cleanup complete."

echo "Backup process finished."
send_notification "🎉 Homelab backup and cloud upload completed successfully!"

Step 3: Automate with Cron

To run this script automatically, you must add it to the root user's crontab. This is critical for giving the script permission to read all Docker files.

Open the root crontab editor:
```
sudo crontab -e
```
Add the following line to schedule the backup for 3:00 AM every morning:
0 3 * * * /path/to/your/backup.sh
You will now get a fresh, onsite and off-site backup every night and a Discord message when it's done.

Chapter 2: Automated Updates with Watchtower (at 6 AM)

Manually updating every Docker container is tedious. We can automate this by deploying Watchtower.

Step 1: The Docker Compose File

Create a docker-compose.yml for Watchtower. This configuration schedules it to run once a day at 6:00 AM, clean up old images, and send a Discord notification only if it finds an update.

mkdir -p ~/docker/watchtower
cd ~/docker/watchtower
nano docker-compose.yml

Paste in this configuration:

services:
  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
        # Timezone setting
        TZ: America/Chicago

        # Discord notification settings
        WATCHTOWER_NOTIFICATIONS: shoutrrr
        WATCHTOWER_NOTIFICATION_URL: "discord://YOUR_DISCORD_WEBHOOK_ID_URL>

        # Notification settings
        WATCHTOWER_NOTIFICATIONS_LEVEL: info
        WATCHTOWER_NOTIFICATION_REPORT: "true"
        WATCHTOWER_NOTIFICATIONS_HOSTNAME: Homelab-Laptop

        # Update settings
        WATCHTOWER_CLEANUP: "true"
        WATCHTOWER_INCLUDE_STOPPED: "false"
        WATCHTOWER_INCLUDE_RESTARTING: "true"
        WATCHTOWER_SCHEDULE: "0 0 6 * * *"

Note: The WATCHTOWER_NOTIFICATION_URL uses a special shoutrrr format for Discord, which looks like discord://token@webhook-id.

Now, every morning at 6:00 AM, Watchtower will scan all running containers and update any that have a new image available.

Chapter 3: Proactive Alerting (24/7)

The final piece of automation is proactive alerting. This setup ensures you are immediately notified via Discord if something goes wrong.

Step 1: The Alerting Pipeline

The pipeline we'll build is: Prometheus (detects problems) -> Alertmanager (groups and routes alerts) -> Discord (notifies you).

Step 2: Deploy Alertmanager

First, deploy Alertmanager. It must be on the same npm_default network as Prometheus.

mkdir -p ~/docker/alertmanager
cd ~/docker/alertmanager
Create the alertmanager.yml configuration file:
```
nano alertmanager.yml
```

Paste in this configuration. It uses advanced routing to send critical alerts every 2 hours and warning alerts every 12 hours.

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname", "severity"]
  group_wait: 30s
  group_interval: 10m
  repeat_interbal: 12h
  receiver: "discord-notifications"
  routes:
    - receiver: "discord-notifications"
      matchers:
        - severity="critical"
      repeat_interval: 2h
    - receiver: "discord-notifications"
      matchers:
        - severity="warning"
      repeat_interval: 12h

receivers:
  - name: "discord-notifications"
    discord_configs:
      - webhook_url: "YOUR_DISCORD_WEBHOOK_URL"
        send_resolved: true

Now create the docker-compose.yml for Alertmanager:
```
nano docker-compose.yml
```

Paste in the following:

services:
  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
    networks:
      - npm_default

networks:
  npm_default:
    external: true

Launch it: docker compose up -d

Step 3: Configure Prometheus

Finally, tell Prometheus to send alerts to Alertmanager and load your rules.

Create your rules file, ~/docker/monitoring/alert_rules.yml, with rules for "Instance Down," "High CPU," "Low Disk Space," etc.
```
cd ~/docker/monitoring
nano alert_rules.yml
```

Add the alert_rules.yml as a volume in your ~/docker/monitoring/docker-compose.yml.

volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert_rules.yml:/etc/prometheus/alert_rules.yml
- prometheus_data:/prometheus

Add the alerting and rule_files blocks to your ~/docker/monitoring/prometheus.yml:

groups:
  -name: Critical System Alerts
  interval: 30s
  rules:
    - alert: InstanceDown
    expr: up == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "🔴 Instance {{ $labels.instance }} is DOWN"
      description: "Service {{ $labels.job }} has been unreachable for 2 minutes."

    - alert: LaptopOnBattery
      expr: node_power_supply_online == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "🔋 Server running on BATTERY"
        description: "Homelab has been unplugged for 5 minutes. Check power connection!"

    - alert: LowBatteryLevel
      expr: node_power_supply_capacity < 20 and node_power_supply_online == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "⚠️ CRITICAL: Battery at {{ $value }}%"
        description: "Battery below 20%. Server may shut down soon!"

    - alert: DiskAlmostFull
      expr: (node_filesystem_avail_bytes{mountpoint="/",fstype!="tmpfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="tmpfs"}) * 100 < 10
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "💾 Disk space critically low: {{ $value | humanize }}% remaining"
        description: "Root filesystem has less than 10% free space."

    - alert: OutOfMemory
      expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 5
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "🧠 Memory critically low: {{ $value | humanize }}% available"
        description: "Less than 5% memory available. System may become unresponsive."

    - alert: CriticalCpuTemperature
      expr: node_hwmon_temp_celsius{chip="coretemp"} > 95
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "🔥 CRITICAL CPU Temperature: {{ $value }}°C"
        description: "CPU temperature exceeds 95°C. Thermal throttling or shutdown imminent!"

  - name: Warning System Alerts
  interval: 1m
  rules:
    - alert: HighCpuUsage
      expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "⚡ High CPU usage: {{ $value | humanize }}%"
        description: "CPU usage above 80% for 5 minutes on {{ $labels.instance }}"

    - alert: HighSystemLoad
      expr: node_load5 / on(instance) count(node_cpu_seconds_total{mode="idle"}) by (instance) > 1.5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "📊 High system load: {{ $value | humanize }}"
        description: "5-minute load average is 1.5x CPU cores for 10 minutes."

    - alert: HighMemoryUsage
      expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 20
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "🧠 High memory usage: {{ $value | humanize }}% available"
        description: "Less than 20% memory available."

    - alert: HighCpuTemperature
      expr: node_hwmon_temp_celsius{chip="coretemp"} > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "🌡️ High CPU temperature: {{ $value }}°C"
        description: "CPU temperature above 85°C. Consider improving cooling."

    - alert: HighNvmeTemperature
      expr: node_hwmon_temp_celsius{chip="nvme"} > 65
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "💿 High NVMe temperature: {{ $value }}°C"
        description: "NVMe drive temperature above 65°C for 10 minutes."

    - alert: DiskSpaceLow
      expr: (node_filesystem_avail_bytes{mountpoint="/",fstype!="tmpfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="tmpfs"}) * 100 < 20
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "💾 Disk space low: {{ $value | humanize }}% remaining"
        description: "Root filesystem has less than 20% free space."

    - alert: HighSwapUsage
      expr: ((node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes * 100) > 50
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "💱 High swap usage: {{ $value | humanize }}%"
        description: "Swap usage above 50%. System may be memory-constrained."

    # Monitor your USB-C hub ethernet adapter (enx00)
    - alert: EthernetInterfaceDown
      expr: node_network_up{device="enx00"} == 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "🌐 USB-C Ethernet adapter is DISCONNECTED"
        description: "Your USB-C hub ethernet connection (enx00) is down. Check cable or hub."

    - alert: HighNetworkErrors
      expr: rate(node_network_receive_errs_total{device="enx00"}[5m]) > 10 or rate(node_network_transmit_errs_total{device="enx00"}[5m]) > 10
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "🌐 High network errors on USB-C ethernet"
        description: "Your ethernet adapter is experiencing high error rate. Check cable quality."

  - name: Docker Container Alerts
  interval: 1m
  rules:
    # Simplified alert - just checks if container exporter is working
    - alert: ContainerMonitoringDown
      expr: absent(container_last_seen)
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "🐳 Container monitoring is down"
        description: "cAdvisor or container metrics are not available. Check if containers are being monitored."

    - alert: ContainerRestarting
      expr: rate(container_start_time_seconds[5m]) > 0.01
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "🐳 Container {{ $labels.name }} is restarting"
        description: "Container {{ $labels.name }} has restarted recently."

    - alert: ContainerHighCpu
      expr: rate(container_cpu_usage_seconds_total{name!~".*POD.*",name!=""}[5m]) * 100 > 80
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "🐳 Container {{ $labels.name }} high CPU: {{ $value | humanize }}%"
        description: "Container CPU usage above 80% for 10 minutes."

Restart Prometheus to apply the changes:
```
cd ~/docker/monitoring
docker compose up -d --force-recreate prometheus
```
Now, if any service fails or your server's resources run low, you will get an instant notification in Discord.

Step 3: The Critical Firewall Fix

You may find your alerts are not sending. This is often due to a conflict between Docker and ufw.

Open the main ufw configuration file:
```
sudo nano /etc/default/ufw
```
Change DEFAULT_FORWARD_POLICY="DROP" to DEFAULT_FORWARD_POLICY="ACCEPT".
Reload the firewall:
```
sudo ufw reload
```
Restart your containers that need internet access:
```
docker compose restart
```

Now, if any service fails or your server's resources run low, you will get an instant notification in Discord.

Conclusion

Our homelab has now truly come to life. It's no longer just a collection of services but a resilient, self-maintaining platform. With automated backups to Google Drive, daily updates via Watchtower, and proactive alerts with Prometheus and Alertmanager, our server can now run 24/7 with minimal manual intervention. We've built a solid, reliable, and intelligent system.

But there's one critical piece still missing: end-to-end security for our local services.

Right now, we're accessing our dashboards at addresses like http://grafana.local, which browsers flag as "Not Secure." What if we could use a real, public domain name for our internal services and get a valid HTTPS certificate, all without opening a single port on our router?

In the next part of this series, I'll show you exactly how to do that. We'll dive into an advanced but powerful setup using Cloudflare and Nginx Proxy Manager to bring trusted, zero-exposure SSL to everything we've built.

Stay tuned!

Part 3: A High-Availability DNS Network with AdGuard Home

Prajwol Adhikari — Sun, 10 May 2026 00:44:45 +0000

Introduction

Welcome to Part 3 of my homelab series! In the previous parts, I built my server and deployed a suite of management and monitoring tools. Now, it's time to build the brain of my network: a robust, redundant, and high-availability DNS system using AdGuard Home that works both at home and on the go.

In this detailed guide, I'll walk you through how I deployed a total of three AdGuard Home instances, each with its own unique IP address. I set up a primary resolver on my homelab, a secondary failover resolver in the cloud for my mobile devices, and a tertiary resolver on a separate virtual network for local redundancy.

Chapter 1: The Local Workhorse (Primary DNS)

I started by deploying my main, day-to-day DNS resolver on my homelab server.

Step 1: Deploying AdGuard Home with Docker Compose

First, I SSHed into my server, created a directory for the project, and a docker-compose.yml file to define the service.

mkdir -p ~/docker/adguard-primary
cd ~/docker/adguard-primary
nano docker-compose.yml

I pasted in the following configuration. This runs the AdGuard Home container, maps all the necessary ports for DNS and the web UI, and connects it to the shared npm_default network I set up in Part 2.

services:
  adguardhome:
    image: adguard/adguardhome:latest
    container_name: adguard-primary
    restart: unless-stopped
    ports:
      - "53:53/tcp"
      - "53:53/udp"
      - "8080:80/tcp"      # Web UI
      - "853:853/tcp"      # DNS-over-TLS
    volumes:
      - ./workdir:/opt/adguardhome/work
      - ./confdir:/opt/adguardhome/conf
    networks:
      - npm_default

networks:
  npm_default:
    external: true

I then launched the container by running:

docker compose up -d

Step 2: Initial AdGuard Home Setup Wizard

I navigated to http://<your-server-ip>:3000 in my web browser to start the setup wizard.
I clicked "Get Started."
On the "Admin Web Interface" screen, I changed the "Listen Interface" to All interfaces and the port to 80.
On the "DNS server" screen, I changed the "Listen Interface" to All interfaces and left the port as 53.
I followed the prompts to create my admin username and password.
Once the setup was complete, I was redirected to my main dashboard, now available at http://<your-server-ip>:8080.

Step 3: Configure My Home Router

To make all my devices use AdGuard automatically, I logged into my home router's admin panel, found the DHCP Server settings, and changed the Primary DNS Server to my homelab's static IP address (e.g., 192.168.1.10).

Chapter 2: The Cloud Failover (Secondary DNS on Oracle Cloud)

An off-site DNS server ensures I have ad-blocking on my mobile devices and acts as a backup.

Why I Chose Oracle Cloud

After testing the free tiers of both AWS and Linode, I chose Oracle Cloud Infrastructure (OCI). In my experience, OCI's "Always Free" tier is far more generous with its resources. It provides powerful Ampere A1 Compute instances with up to 4 CPU cores and 24 GB of RAM, plus 200 GB of storage and significant bandwidth, all for free. This was ideal for running my service 24/7 without the strict limitations or eventual costs associated with other providers.

Step 1: Launching the Oracle Cloud VM

Sign Up: I created my account on the Oracle Cloud website.
Create VM Instance: In the OCI console, I navigated to Compute > Instances and clicked "Create instance".
Configure Instance:

- **Name:** I gave it a name like `AdGuard-Cloud`.

- **Image and Shape:** I clicked "Edit". For the image, I selected Ubuntu. For the shape, I selected "Ampere" and chose the `VM.Standard.A1.Flex` shape (it's "Always Free-eligible").

- **Networking:** I used the default VCN and made sure "Assign a public IPv4 address" was checked.

- **SSH Keys:** I added my SSH public key.

I clicked Create. Once the instance was running, I took note of its Public IP Address.

Step 2: Configuring the Cloud Firewall

For maximum security, I locked down the administrative ports to only my home IP address.

Find My Public IP: I went to a site like whatismyip.com and copied my home's public IP address.
Edit Security List: I navigated to my instance's details page, clicked the subnet link, then clicked the "Security List" link.
I clicked "Add Ingress Rules" and added the following rules:

- **For SSH (Port 22):** I set the Source to my home's public IP, followed by `/32` (e.g., `203.0.113.55/32`). This is a critical security step.

- **For AdGuard Setup (Port 3000):** I also set the Source to my home's public IP with `/32`.

- **For AdGuard Web UI (Port 80/443):** I set the Source to my home's public IP with `/32` as well.

- **For Public DNS (Port 53, 853, etc.):** I set the Source to `0.0.0.0/0` (Anywhere) to allow all my devices to connect from any network.

Step 3: Installing AdGuard Home & Configuring SSL

Connect via SSH: I used the public IP and my SSH key to connect to the VM.
Run Install Script: I chose to install AdGuard Home directly on the OS for this instance.
```
curl -s -S -L https://raw.githubusercontent.com/AdguardTeam/AdGuardHome/master/scripts/install.sh | sh -s -- -v
```
The script will give you a link, like http://YOUR_INSTANCE_IP:3000. Open this in your browser. Follow the on-screen steps to create your admin username and password.
Get a Hostname: I went to No-IP.com, created a free hostname (e.g., my-cloud-dns.ddns.net), and pointed it to my cloud VM's public IP.
Enable Encryption: We'll use Let's Encrypt and Certbot to get a free SSL certificate, which lets us use secure https:// and encrypted DNS.

- **Install Certbot:** In your SSH session, run these commands:

```bash
sudo apt update
sudo apt install certbot -y
```

- **Get the Certificate:** Run this command, replacing the email and domain with your own.

```bash
# This command will temporarily stop any service on port 80, get the certificate, and then finish.
sudo certbot certonly --standalone --agree-tos --email YOUR_EMAIL@example.com -d your-no-ip-hostname.ddns.net
```

If it's successful, it will tell you where your certificate files are saved (usually in `/etc/letsencrypt/live/your-no-ip-hostname.ddns.net/`).

- **Configure AdGuard Home Encryption:**
  * Go to your AdGuard Home dashboard (**Settings -> Encryption settings**).
  * Check **"Enable encryption"**.
  * In the **"Server name"** field, enter your No-IP hostname.
  * Under **"Certificates"**, choose **"Set a certificates file path"**.
    * **Certificate path:** `/etc/letsencrypt/live/your-no-ip-hostname.ddns.net/fullchain.pem`
    * **Private key path:** `/etc/letsencrypt/live/your-no-ip-hostname.ddns.net/privkey.pem`
* Click **"Save configuration"**. The page will reload on a secure `https://` connection!

Step 4: Automating SSL Renewal (Cron Job)

Let's Encrypt certificates last for 90 days. We can tell our server to automatically renew them.

Open Firewall (Port 80): Certbot requires port 80 for its renewal challenge. We must add this ufw rule on our server, or the renewal will fail.
```
sudo ufw allow 80/tcp
```
Open the Cron Editor: In SSH, run sudo crontab -e and choose nano as your editor.
Add the Renewal Job: Add this line to the bottom of the file. It tells the server to try renewing the certificate every day at 2:30 AM.
```
30 2 * * * certbot renew --quiet --pre-hook "systemctl stop AdGuardHome.service" --post-hook "systemctl start AdGuardHome.service"
```
Note: The --post-hook is critical. It guarantees AdGuard Home restarts even if the renewal fails, which prevents a service outage.
Save and exit (Ctrl+X, then Y, then Enter). Your server will now keep its certificate fresh forever!

Step 5: Creating a Cloud Backup (Snapshot)

A critical final step for any cloud service is creating a backup. Here is how I did it in OCI:

In the OCI Console, I navigated to the details page for my AdGuard-Cloud instance.
Under the "Resources" menu on the left, I clicked on "Boot volume".
On the Boot Volume details page, under "Resources," I clicked "Boot volume backups".
I clicked the "Create boot volume backup" button.
I gave the backup a descriptive name (e.g., AdGuard-Cloud-Backup-YYYY-MM-DD) and clicked the create button. This creates a full snapshot of my server that I can use to restore it in minutes.

Step 6: How to Use Your Cloud DNS on Mobile Devices

The main benefit of the cloud server is having ad-blocking on the go. Here’s how I set it up on my mobile phone using secure, encrypted DNS.

For Android (Version 9+):

Modern Android has a built-in feature called "Private DNS" that uses DNS-over-TLS (DoT), which is perfect for this.

Open Settings on your Android device.
Tap on "Network & internet" (this may be called "Connections" on some devices).
Find and tap on "Private DNS". You may need to look under an "Advanced" section.
Select the option labeled "Private DNS provider hostname".
In the text box, enter the No-IP hostname you created for your Oracle Cloud server (e.g., my-cloud-dns.ddns.net).
Tap Save.

Your phone will now send all its DNS queries through an encrypted tunnel to your personal AdGuard Home server in the cloud, giving you ad-blocking on both Wi-Fi and cellular data.

For iOS (iPhone/iPad):

On iOS, the easiest way to set up encrypted DNS is by installing a configuration profile.

On your iPhone or iPad, open Safari.
Go to a DNS profile generator site, like the one provided by AdGuard.
When prompted, enter the DNS-over-HTTPS (DoH) address for your cloud server. It will be your No-IP hostname with /dns-query at the end (e.g., https://my-cloud-dns.ddns.net/dns-query).
Download the generated configuration profile.
Go to your device's Settings app. You will see a new "Profile Downloaded" item near the top. Tap on it.
Follow the on-screen prompts to Install the profile. You may need to enter your device passcode.

Once installed, your iOS device will also route its DNS traffic through your secure cloud server.

Chapter 3: Ultimate Local Redundancy (Tertiary DNS with Macvlan)

For an extra layer of redundancy within my homelab, I created a third AdGuard instance. By using an advanced Docker network mode called macvlan, this container gets its own unique IP address on my home network, making it a truly independent resolver.

Create Macvlan Network: First, I created the macvlan network, telling it which of my physical network cards to use (eth0 in my case).

docker network create -d macvlan \
  --subnet=192.168.1.0/24 \
  --gateway=192.168.1.1 \
  -o parent=eth0 homelab_net

Deploy Tertiary Instance: I created a new folder (~/docker/adguard-tertiary) and this docker-compose.yml. Notice there are no ports since the container gets its own IP.

services:
  adguardhome2:
    image: adguard/adguardhome:latest
    container_name: adguardhome2
    volumes:
      - "./work:/opt/adguardhome/work"
      - "./conf:/opt/adguardhome/conf"
    networks:
      homelab_net:
        ipv4_address: 192.168.1.11 # The new, unique IP for this container
    restart: unless-stopped

networks:
  homelab_net:
    external: true

Configure Router for Local Failover: To complete the local redundancy, I went back into my router's DHCP settings.

- In the **Primary DNS** field, I have the IP of my main homelab server (e.g., `192.168.1.10`).

- In the **Secondary DNS** field, I entered the unique IP address I assigned to my macvlan container (e.g., `192.168.1.11`).


Now, if my primary AdGuard container has an issue, all devices on my network will automatically fail over to the tertiary instance.

Chapter 4: Fine-Tuning and Integration

Finally, I implemented some best practices on my primary AdGuard Home instance.

Upstream DNS Servers: Under Settings > DNS Settings, I configured AdGuard to send requests to multiple resolvers in parallel for speed and reliability, using Cloudflare (1.1.1.1), Google (8.8.8.8), and Quad9 (9.9.9.9).
Enable DNSSEC: In the same settings page, I enabled DNSSEC to verify the integrity of DNS responses.
DNS Blocklists: I added several popular lists from the "Filters > DNS blocklists" page, including the AdGuard DNS filter and the OISD Blocklist, for robust protection.
DNS Rewrites for Local Services: This is the key to a clean homelab experience. For each service, I performed a detailed two-step process:

1. Create the Proxy Host in Nginx Proxy Manager: I logged into my NPM admin panel, went to Hosts > Proxy Hosts, and clicked "Add Proxy Host". For my Homer dashboard, I set the Forward Hostname to homer (the container name) and the Forward Port to 8080 (its internal port), using homer.local as the domain name.



Create the DNS Rewrite in AdGuard Home: I logged into my primary AdGuard dashboard, went to Filters > DNS Rewrites, and clicked "Add DNS rewrite". I entered homer.local as the domain and the IP address of my Nginx Proxy Manager server as the answer.

Conclusion

I've now built an incredibly robust, multi-layered DNS infrastructure. My home devices use the primary local server, which is backed up by a second, independent local server, and my mobile devices use a completely separate cloud instance for on-the-go protection. This provides a resilient, secure, and ad-free internet experience.

In the final part of this series, we'll shift our focus from deploying services to maintaining them. I'll show you how I set up a fully automated operations pipeline for my homelab, including daily off-site backups, automatic container updates with Watchtower, and proactive alerting with Prometheus. Stay tuned!

Part 2: Homelab Management & Monitoring

Prajwol Adhikari — Sun, 10 May 2026 00:42:07 +0000

Introduction

Welcome to Part 2 of my homelab series! In Part 1, we built a solid foundation by turning an old laptop into a hardened Debian server with Docker. Now that our server is running, we need to deploy services to manage, monitor, and easily access our projects.

In this guide, we'll deploy three essential stacks. First, Nginx Proxy Manager (NPM) will act as our server's front door and create a shared network for our containers. Second, we'll set up a professional-grade monitoring stack with Prometheus and Grafana. Finally, we'll deploy a Homer dashboard to create a beautiful and convenient launchpad for all our services.

1. The Management Layer: Nginx Proxy Manager (NPM) 🌐

Before we can deploy our other services, we need a way to manage connections between them. NPM will act as our reverse proxy and, crucially, will create the shared Docker network that all our other services will connect to.

A. Deploy Nginx Proxy Manager

First, let's create a directory and the docker-compose.yml file for NPM.

# Create the directory
mkdir -p ~/docker/npm
cd ~/docker/npm

# Create the docker-compose.yml
nano docker-compose.yml

Paste in the following configuration. This file defines the NPM service and creates a network named npm_default.

services: app: image: 'jc21/nginx-proxy-manager:latest' container_name: npm-app-1 restart: unless-stopped ports: - '80:80' - '443:443' - '81:81' volumes: - ./data:/data - ./letsencrypt:/etc/letsencrypt networks: default: name: npm_defaultplaintext

Launch it with

docker compose up -dyaml
You can now log in to the admin UI at http://<your-server-ip>:81.

2. The Monitoring Stack 📊

With our shared network in place, we can now deploy our monitoring stack.

Prometheus: Collects all the metrics.
Node Exporter: Exposes the server's hardware metrics.
cAdvisor: Exposes Docker container metrics.
Grafana: Visualizes all the data in beautiful dashboards.

A. Create the Prometheus Configuration

Prometheus needs a config file to know what to monitor.

Create the project directory

mkdir -p ~/docker/monitoring
cd ~/docker/monitoring

Create the prometheus.yml file

nano prometheus.yml
`plaintext

Paste in the following configuration:

`
global:
scrape_interval: 15s

scrape_configs:

job_name: 'prometheus' static_configs:
- targets: ['localhost:9090']
job_name: 'node-exporter' static_configs:
- targets: ['node-exporter:9100']
job_name: 'cadvisor' static_configs:
- targets: ['cadvisor:8080'] `yaml

B. Deploy the Stack with Docker Compose

Next, create the docker-compose.yml file in the same ~/docker/monitoring directory.

nano docker-compose.ymlplaintext

This file defines all four monitoring services and tells them to connect to the npm_default network we created earlier.

`
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
networks:
- default

grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3001:3000"
volumes:
- grafana_data:/var/lib/grafana
networks:
- default

node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- default

cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
ports:
- "8081:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
networks:
- default

volumes:
prometheus_data:
grafana_data:

networks:
default:
name: npm_default
external: true
`shell

Now, launch the stack:

docker compose up -d

C. Configure Grafana

Add Data Source: Go to Connections > Data Sources, add a Prometheus source, and set the URL to http://prometheus:9090.
Import Dashboards: Go to Dashboards > New > Import and add these dashboards by ID:

- **Node Exporter Full (ID: `1860`)**

- **Docker Host/Container Metrics (ID: `193`)**

3. The Homer Launchpad Dashboard 🚀

Finally, let's deploy Homer as our beautiful start page with custom icons.

Create Directories & Download Icons: First, create a directory for Homer and an assets subdirectory. Then, cd into the assets folder and download the icons.

mkdir -p ~/docker/homer/assets cd ~/docker/homer/assets wget -O grafana.png [https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/grafana.png](https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/grafana.png) wget -O prometheus.png [https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/prometheus.png](https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/prometheus.png) wget -O cadvisor.png [https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/cadvisor.png](https://raw.githubusercontent.com/walkxcode/dashboard-icons/main/png/cadvisor.png) wget -O npm.png [https://nginxproxymanager.com/icon.png](https://nginxproxymanager.com/icon.png)
Create Configuration: Go back to the main homer directory and create the config.yml file.

cd ~/docker/homer nano config.yml

Paste in the following configuration. The logo: lines point to the icons we just downloaded.

`

title: "Homelab Dashboard"
subtitle: "Server Management"
theme: "dark"

services:
- name: "Management" icon: "fas fa-server" items:
  - name: "Nginx Proxy Manager" logo: "assets/tools/npm.png" subtitle: "Reverse Proxy Admin" url: "http://:81"

  - name: "Monitoring"
    icon: "fas fa-chart-bar"
    items:
      - name: "Grafana"
        logo: "assets/tools/grafana.png"
        subtitle: "Metrics Dashboard"
        url: "http://<your-server-ip>:3001"
      - name: "Prometheus"
        logo: "assets/tools/prometheus.png"
        subtitle: "Metrics Database"
        url: "http://<your-server-ip>:9090"
      - name: "cAdvisor"
        logo: "assets/tools/cadvisor.png"
        subtitle: "Container Metrics"
        url: "http://<your-server-ip>:8081"
```

Create Docker Compose File: Finally, create the docker-compose.yml file.

nano docker-compose.yml

This configuration connects Homer to our shared network.

`
services:
homer:
image: b4bz/homer
container_name: homer
volumes:
- ./config.yml:/www/assets/config.yml
- ./assets:/www/assets/tools
ports:
- "8090:8080"
restart: unless-stopped
networks:
- npm_default

networks:
npm_default:
external: true
`
Launch: Run docker compose up -d. You can now access your new dashboard with custom icons at http://<your-server-ip>:8090.

Conclusion

Our homelab now has a powerful management and monitoring foundation. Nginx Proxy Manager is ready to direct traffic, Grafana is visualizing our server's health, and Homer provides a central launchpad.

In the next part of the series, we'll deploy our core network service, AdGuard Home, and use NPM to create clean, memorable local domains for all the applications we set up today. Stay tuned!

Part 1: Reviving an Old Laptop with Debian & Docker

Prajwol Adhikari — Sun, 10 May 2026 00:37:57 +0000

Introduction

Welcome to the first post in my new homelab series! I've always been fascinated by self-hosting and DevOps, and I believe the best way to learn is by doing. In this series, I'll document my journey of turning an old, unused laptop into a powerful, efficient, and secure bare-metal server for hosting a variety of network services.

The goal for this first part is to lay a solid foundation. We'll take an old laptop, install a minimal and stable Linux operating system, perform some initial security hardening, and set up Docker as our containerization engine. By the end of this post, we'll have a perfect blank canvas ready for the exciting services we'll deploy in the upcoming parts.

1. Choosing the Hardware & OS

Why an Old Laptop?

Before diving in, why use an old laptop instead of a Raspberry Pi or a dedicated server? For a starter homelab, a laptop has three huge advantages:

Cost-Effective: It's free if you have one lying around!
Built-in UPS: The battery acts as a built-in Uninterruptible Power Supply (UPS), keeping the server running through short power outages.
Low Power Consumption: Laptop hardware is designed to be power-efficient, which is great for a device that will be running 24/7.

Why Debian 13 "Trixie"?

For the operating system, I chose Debian. It's renowned for its stability, security, and massive package repository. It’s the bedrock of many other distributions (like Ubuntu) and is perfect for a server because it's lightweight and doesn't include unnecessary software. We'll be using the minimal "net-install" to ensure we only install what we absolutely need.

2. Installation and Network Configuration

The installation process is straightforward, but the network setup is key to a reliable server.

Minimal Installation

Create a Bootable USB: I downloaded the Debian 13 "netinst" ISO from the official website and used Rufus on Windows to create a bootable USB drive.
Boot from USB: I plugged the USB into the laptop and booted from it (usually pressing F12, F2, or Esc during startup to select the USB device).
Language, Location, and Keyboard: Selected English, United States, and the default keyboard layout.
Network Setup: Connected the laptop to my home network (Ethernet preferred for stability).
Hostname & Domain: Entered a short, memorable hostname for the server (e.g., homelab) and left the domain blank.
User Accounts:
- Set a root password.
- Created a non-root regular user (this will be used for daily management).
Partition Disks: Chose Guided – use entire disk with separate /home partition. This is simpler for a server setup.
Software Selection: At the “Software selection” screen:
- Unchecked “Debian desktop environment”
- Checked “SSH server” and “standard system utilities”
- This ensures a clean command-line system that can be accessed remotely.
GRUB Bootloader: Installed GRUB on the primary drive (so the system boots correctly).
Finish Installation: Removed the USB drive when prompted and rebooted into the fresh Debian install.

Setting a Static IP

A server needs a permanent, unchanging IP address. The best way to do this is with DHCP Reservation on your router. This tells your router to always assign the same IP address to your server's unique MAC address.

First, find your laptop’s current IP address and network interface name by running:

ip a

You’ll see output similar to:

2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 192.168.0.45/24 brd 192.168.0.255 scope global dynamic enp3s0
       valid_lft 86396sec preferred_lft 86396sec

In this example:

Interface name: enp3s0
Current IP: 192.168.0.45
MAC address: shown under link/ether in the same section.

With this info, log into your router’s admin panel, find the "DHCP Reservation" or "Static Leases" section, and assign a memorable IP address (e.g., 192.168.0.45) to your server’s MAC address.

This ensures the server always gets the same IP from your router, making it easy to find on your network.

Connecting Remotely with SSH

With a static IP set, all future management will be done remotely using an SSH client. For Windows, I highly recommend Solar-PuTTY. I created a new session, entered the server's static IP address, my username, and password, and connected.

3. Initial Server Hardening

With a remote SSH session active, the first thing to do is secure the server and configure it for its headless role.

Update the System

First, let's make sure all packages are up to date.

sudo apt update && sudo apt upgrade -y

Configure the Firewall

ufw (Uncomplicated Firewall) is perfect for a simple setup. We'll set it to deny all incoming traffic by default and only allow SSH connections.

# Install UFW
sudo apt install ufw -y

# Allow SSH connections
sudo ufw allow ssh

# Enable the firewall
sudo ufw enable

Configure Lid-Close Action

To ensure the laptop keeps running when the lid is closed, we edit the logind.conf file.

sudo nano /etc/systemd/logind.conf

Uncomment the line:

HandleLidSwitch=ignore

Save the file, then restart the service:

sudo systemctl restart systemd-logind.service

4. Installing the Containerization Engine: Docker

Instead of installing applications directly on our host, we'll use Docker to keep the system clean and make management easier.

Install Docker Engine

The official convenience script is the easiest way to get the latest version.

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

Add User to Docker Group

To run docker commands without sudo, add your user to the docker group. The $USER variable automatically uses the currently logged-in user.

sudo usermod -aG docker $USER

After this, log out and log back in for the change to take effect.

Install Docker Compose

Docker Compose is essential for managing multi-container applications with a simple YAML file.

sudo apt install docker-compose-plugin -y

To verify the installation:

docker compose version

Conclusion

And that's it for Part 1! We've successfully turned an old piece of hardware into a hardened, modern server running Debian and Docker with a reliable network configuration. We have a solid and secure foundation to build upon.

In the next part of the series, we'll deploy our first critical service: a local, network-wide ad-blocking DNS resolver using AdGuard Home. Stay tuned!

Your Personal Internet Guardian: How to Build a FREE Ad-Blocker in the Cloud! 🚀

Prajwol Adhikari — Tue, 26 Aug 2025 04:21:23 +0000

Hey everyone! A while back, I wrote a guide on setting up AdGuard Home on Linode. The world of tech moves fast, and it's time for an upgrade! Today, we're going to build our own powerful, network-wide ad-blocker using Amazon Web Services (AWS), and we'll make it secure with our own domain and SSL certificate.

Think of this as building a digital gatekeeper for your internet. Before any ads, trackers, or malicious sites can reach your devices, our AdGuard Home server will slam the door shut. The best part? This works on your phone, laptop, smart TV—anything on your network—without installing a single app on them.

This guide is for everyone, from seasoned tech wizards to curious beginners. We'll break down every step in simple terms, so grab a coffee, and let's build something awesome!

## Chapter 1: Building Our Home in the AWS Cloud ☁️

First, we need a server. We'll use an Amazon EC2 instance, which is just a fancy name for a virtual computer that you rent.

Sign Up for AWS: If you don't have an account, head to the AWS website and sign up. You'll need a credit card for verification, but for this guide, we can often stay within the Free Tier.
Launch Your EC2 Instance:
- Log in to your AWS Console and search for EC2.
- Click "Launch instance".
- Name: Give your server a cool name, like AdGuard-Server.
- Application and OS Images: In the search bar, type Debian and select the latest version (e.g., Debian 12). Make sure it's marked "Free tier eligible".
- Instance Type: Choose t2.micro. This is your free, trusty little server.
- Key Pair (for login): This is your digital key to the server's front door. Click "Create a new key pair", name it something like my-adguard-key, and download the .pem file. Keep this file secret and safe!
- Network settings (The Firewall): This is crucial. We need to tell our server which doors to open. Click "Edit".
  - Check the box for "Allow SSH traffic from" and select My IP. This lets you securely log in.
  - Check "Allow HTTPS traffic from the internet" and "Allow HTTP traffic from the internet". We'll need these for our secure dashboard later.
Launch It! Hit the "Launch instance" button and watch as your new cloud server comes to life.
Give Your Server a Permanent Address (Elastic IP):
- By default, your server's public IP address will change every time it reboots. Let's make it permanent!
- In the EC2 menu on the left, go to "Elastic IPs".
- Click "Allocate Elastic IP address" and then "Allocate".
- Select the new IP address from the list, click "Actions", and then "Associate Elastic IP address".
- Choose your AdGuard-Server instance from the list and click "Associate".
- Your server now has a static IP address that will never change! Make a note of this new IP.

## Chapter 2: Opening the Doors (Configuring the Firewall) 🚪

Our server is running, but we need to open a few more specific doors for AdGuard Home to work.

Go to your EC2 Instance details, click the "Security" tab, and click on the Security Group name.
Click "Edit inbound rules" and "Add rule" for each of the following:
- Port 3000: Custom TCP, Port 3000, Source My IP. (For the initial setup).
- Port 53 (TCP): Custom TCP, Port 53, Source Anywhere-IPv4. (For DNS).
- Port 53 (UDP): Custom UDP, Port 53, Source Anywhere-IPv4. (Also for DNS).
- Port 853: Custom TCP, Port 853, Source Anywhere-IPv4. (For DNS-over-TLS).
Click "Save rules". Your firewall is now ready!

## Chapter 3: Installing AdGuard Home 🛡️

Now, let's connect to our server and install the magic software.

Connect via SSH: Open a terminal (PowerShell on Windows, Terminal on Mac/Linux) and use the key you downloaded to connect. Use your new Elastic IP address!
```
# Replace the path and Elastic IP with your own
ssh -i "path/to/my-adguard-key.pem" admin@YOUR_ELASTIC_IP
```

Install AdGuard Home: Run this one simple command. It downloads and installs everything for you.

curl -s -S -L [https://raw.githubusercontent.com/AdguardTeam/AdGuardHome/master/scripts/install.sh](https://raw.githubusercontent.com/AdguardTeam/AdGuardHome/master/scripts/install.sh) | sh -s -- -v

Run the Setup Wizard: The script will give you a link, like http://YOUR_ELASTIC_IP:3000. Open this in your browser. Follow the on-screen steps to create your admin username and password.

## Chapter 4: Teaching Your Guardian Who to Trust and What to Block

With AdGuard Home installed, the next step is to configure its core brain: the DNS servers it gets its answers from and the blocklists it uses to protect your network.

1. Setting Up Upstream DNS Servers

Think of "Upstream DNS Servers" as the giant, public phonebooks of the internet. When your AdGuard server doesn't know an address (and it's not on a blocklist), it asks one of these upstreams. It's recommended to use a mix of the best encrypted DNS providers for security, privacy, and speed.

In the AdGuard dashboard, go to Settings -> DNS settings. In the "Upstream DNS servers" box, enter the following, one per line:

https://dns.quad9.net/dns-query
https://dns.google/dns-query
https://dns.cloudflare.com/dns-query
Quad9: Focuses heavily on security, blocking malicious domains.
Google: Known for being very fast.
Cloudflare: A great all-around choice with a strong focus on privacy.

2. Optimizing DNS Performance

Still in the DNS settings page, scroll down to optimize how your server queries the upstreams.

Parallel requests: Select this option. This is the fastest and most resilient mode. It sends your DNS query to all three of your upstream servers at the same time and uses the answer from the very first one that responds. This ensures you always get the quickest possible result.
Enable EDNS client subnet (ECS): Check this box. This is very important for services like Netflix, YouTube, and other content delivery networks (CDNs). It helps them give you content from a server that is geographically closest to you, resulting in faster speeds and a better experience.

3. Enabling DNSSEC

Right below the upstream servers, there's a checkbox for "Enable DNSSEC". You should check this box. DNSSEC is like a digital wax seal on a letter; it verifies that the DNS answers you're getting are authentic and haven't been tampered with. It's a simple, one-click security boost.

4. Choosing Your Blocklists

This is the fun part—the actual ad-blocking! Go to Filters -> DNS blocklists. For a "Balanced & Powerful" setup that blocks aggressively without a high risk of breaking websites, enable the following lists:

AdGuard DNS filter: A great, well-maintained baseline.
OISD Blocklist Big: Widely considered one of the best all-in-one lists for blocking ads, trackers, and malware.
HaGeZi's Pro Blocklist: A fantastic list that adds another layer of aggressive blocking for privacy.
HaGeZi's Threat Intelligence Feed: A crucial security-only list that focuses on protecting against active threats like phishing and malware.

This combination will give you robust protection against both annoyances and real dangers.

## Chapter 5: Giving Your Server a Name (Free Domain with No-IP) 📛

An IP address is hard to remember. Let's get a free, memorable name for our server.

Sign Up at No-IP: Go to No-IP.com, create a free account, and create a hostname (e.g., my-dns.ddns.net).
Point it to Your Server: When creating the hostname, enter your server's permanent Elastic IP address. Confirm your account via email.

## Chapter 6: Making It Secure with SSL/TLS 🔐

We'll use Let's Encrypt and Certbot to get a free SSL certificate, which lets us use secure https:// and encrypted DNS.

Install Certbot: In your SSH session, run these commands:
```
sudo apt update
sudo apt install certbot -y
```
Get the Certificate: Run this command, replacing the email and domain with your own.
```
# This command will temporarily stop any service on port 80, get the certificate, and then finish.
sudo certbot certonly --standalone --agree-tos --email YOUR_EMAIL@example.com -d your-no-ip-hostname.ddns.net
```
If it's successful, it will tell you where your certificate files are saved (usually in /etc/letsencrypt/live/your-no-ip-hostname.ddns.net/).
Configure AdGuard Home Encryption:
- Go to your AdGuard Home dashboard (Settings -> Encryption settings).
- Check "Enable encryption".
- In the "Server name" field, enter your No-IP hostname.
- Under "Certificates", choose "Set a certificates file path".
  - Certificate path: /etc/letsencrypt/live/your-no-ip-hostname.ddns.net/fullchain.pem
  - Private key path: /etc/letsencrypt/live/your-no-ip-hostname.ddns.net/privkey.pem
- Click "Save configuration". The page will reload on a secure https:// connection!

## Chapter 7: Automating SSL Renewal (Cron Job Magic) ✨

Let's Encrypt certificates last for 90 days. We can tell our server to automatically renew them.

Open the Cron Editor: In SSH, run sudo crontab -e and choose nano as your editor.
Add the Renewal Job: Add this line to the bottom of the file. It tells the server to try renewing the certificate every day at 2:30 AM.
```
30 2 * * * /usr/bin/certbot renew --quiet
```
Save and exit (Ctrl+X, then Y, then Enter). Your server will now keep its certificate fresh forever!

## Chapter 8: Testing Your New Superpowers (DoH & DoT) 🧪

For a direct confirmation, I used these commands on my computer:

DNS-over-HTTPS (DoH) Test: This test checks if the secure web endpoint for DNS is alive.
```
curl -v [https://your-no-ip-hostname.ddns.net/dns-query](https://your-no-ip-hostname.ddns.net/dns-query)
```
I got a "405 Method Not Allowed" error, which sounds bad but is actually great news. It means I successfully connected to the server, which correctly told me I didn't send a real query. The connection works!
DNS-over-TLS (DoT) Test: This checks the dedicated secure port for DNS. I used a tool called kdig.
```
# I had to install it first with: sudo apt install knot-dnsutils
kdig @your-no-ip-hostname.ddns.net +tls-ca +tls-host=your-no-ip-hostname.ddns.net example.com
```
The command returned a perfect DNS answer for example.com, confirming the secure tunnel was working.

## Chapter 9: Protecting Your Kingdom (Router & Phone Setup) 🏰

Now, let's point your devices to their new guardian.

On Your Home Router: Log in to your router's admin page, find the DNS settings, and enter your server's Elastic IP as the primary DNS server. Leave the secondary field blank! This forces all devices on your Wi-Fi to be protected. Then, restart your router.
On Your Mobile Phone:
- Android: Go to Settings -> Network -> Private DNS. Choose "Private DNS provider hostname" and enter your No-IP hostname (my-dns.ddns.net). This gives you ad-blocking everywhere, even on cellular data!
- iOS: You can use a profile to configure DoH. A simple way is to use a site like AdGuard's DNS profile generator, but enter your own server's DoH address (https://my-dns.ddns.net/dns-query).

## Chapter 10: The Ultimate Safety Net (Creating a Snapshot) 📸

Finally, let's back up our perfect setup.

In the EC2 Console, go to your instance details.
Click the "Storage" tab and click the "Volume ID".
Click "Actions" -> "Create snapshot".
Give it a description, like AdGuard-Working-Setup-Backup.

If you ever mess something up, you can use this snapshot to restore your server to this exact working state in minutes.

## Bonus Chapter: Common Troubleshooting Tips

If things aren't working, here are a few common pitfalls to check:

Browser Overrides Everything: If one device isn't blocking ads, check its browser settings! Modern browsers like Chrome have a "Secure DNS" feature that can bypass your custom setup. You may need to turn this off.
Check Your Laptop's DNS: Make sure your computer's network settings are set to "Obtain DNS automatically" so it listens to the router. A manually set DNS on your PC will ignore the router's settings.
Beware of IPv6: If you run into trouble on one device, try disabling IPv6 in that device's Wi-Fi adapter properties to force it to use your working IPv4 setup.

## It’s a Wrap!

And there you have it! You've successfully built a personal, secure, ad-blocking DNS server in the cloud. You've learned about cloud computing, firewalls, DNS, SSL, and automation. Go enjoy a faster, cleaner, and more private internet experience.

Running Private Adguard Server on Cloud (Linode)

Prajwol Adhikari — Sat, 01 Mar 2025 04:44:00 +0000

What's the buzz about AdGuard Home?

Imagine AdGuard Home as your personal internet guardian. This versatile tool blocks ads, trackers, and other online nuisances across all devices connected to your network. Whether you're browsing on your phone, tablet, or computer, AdGuard Home has your back.

In today's digital landscape, robust security measures are paramount. Protecting each device shields your family from accidental clicks and malicious attacks, ensuring peace of mind and a secure online environment.

Why on the Cloud?

While setting up AdGuard Home on your home network is great, installing it on a cloud server like Linode takes things up a notch. Here's why:

On-the-Go Protection: Your devices stay protected from ads and trackers, no matter where you are, you can even share it with your family.
Centralized Control: Manage and customize your ad-blocking settings from a single dashboard.
Enhanced Privacy: Keep your browsing data away from prying eyes.

Ready to embark on this ad-free adventure? Let's get started!

Setting Up The Environment

Step 1: Create a Linode Cloud Account

Why choose Linode? Through NetworkChuck's referral link, you receive a generous $100 cloud credit - a fantastic start!

Sign Up: Navigate to Linode's signup page and register.
Access the Dashboard: Log in and select 'Linodes' from the left-side menu.
Create a Linode: Click 'Create Linode,' choose your preferred region, and select an operating system (Debian 11 is a solid choice).
Choose a Plan: The Shared 1GB Nanode instance is sufficient for AdGuard Home.
Label and Secure: Assign a label to your Linode and set a strong root password.
Deploy: Click 'Create Linode' and wait for it to initialize.

Once your Linode is up and running, access it via the LISH Console or SSH. (use root as localhost login)

Step 2: Installing AdGuard Home on Linode

Yes, we're already into setting up at this point.

Log In: Access your Linode using SSH or the LISH Console with your root credentials.
Update the system:

sudo apt update && apt upgrade -y

Go ahead and copy this command to Install Adguard Home:

curl -s -S -L https://raw.githubusercontent.com/AdguardTeam/AdGuardHome/master/scripts/install.sh | sh -s -- -v

AdGuard Home is installed and running. You can use CTRL+Shift+V to paste into the terminal.

Step 3: Configure AdGuard Home

Post-installation, you'll see a list of IP addresses with port :3000.

Access the Web Interface: Open your browser and navigate to the IP address followed by :3000. If you encounter a security warning, proceed by clicking "Continue to site."
Initial Setup: Click 'Get Started' and follow the prompts. When uncertain, default settings are typically fine.
Set Credentials: Set up the Username and Password.

Step 4: Integrate AdGuard Home with Your Router

After this your AdGuard Home is running, but in order to use it on your devices you need to setup inside your home router for all your devices to be protected. For that, I can't walk you through each and every router settings, but the steps are pretty similar.

Find your router IP address, you should be able to find it on the back of your router (commonly 192.168.0.1 or 192.169.1.1) enter it into your browser.
Login into your router using the credentials mentioned in the back of your router; the default is often admin for both username and password. I suggest you change your default password.
Configure DNS Settings:
- Enable DHCP Server: Ensure your router's DHCP server is active.
- Set DNS Addresses: Input your AdGuard Home server's IP as the primary DNS (mine was 96.126.113.207). For secondary DNS, options like 1.1.1.1 (Cloudflare), 9.9.9.9 (Quad9), or 8.8.8.8 (Google) are reliable.
- Save and apply the changes.

Fine-Tuning AdGuard Home

If you've done everything till here you should be good, but for those who enjoy customization, AdGuard Home offers a plethora of settings. Some of the customizations I did are:

Settings

Go to Settings -> General Settings: You can enable Parental Control and Safe Search.
You can also make your Statistics last longer than 24hrs which is default.
Now on Settings -> DNS Settings
- By default it uses DNS from quad9 which is pretty good but I suggest you add more.
- You can click on list of known DNS providers, where you can choose from.
- I used:
- Enable 'Load Balancing' to distribute queries evenly.
- Scroll down to 'DNS server configuration' and enable DNSSEC for enhanced security.
- Click on Save.

Filters

DNS blocklists

Go to Filters -> DNS blocklists, here you can add blocklist that people have created and use it to block even more things. By default AdGuard uses AdGuard DNS filter, and you can add more.

Click on Add blocklist -> Choose from the list
Don't choose too many from the list cause it may slow your internet requests. These are the blocklists I added. And just like that you are blocking more and more things.

DNS rewrites

Go to Filters -> DNS rewrites, here you can add your own DNS entries, so I added AdGuard here.

Click on Add DNS rewrite
Type in domain adguardforme.local and your IP address for AdGuard Home.
And save it.

Now, when I want to go on AdGuard Home dashboard I just type in adguardforme.local and I'm into AdGuard, I don't have to remember the IP address.

Custom filtering rules

Go to Filters -> Custom filtering rules. For some reason when I use Facebook on mobile device stories and videos does not load up, so I added custom filtering rules.

@@||graph.facebook.com^$important

Dive into AI Fun: Running DeepSeek-R1 on Docker Container on Ubuntu

Prajwol Adhikari — Sat, 01 Mar 2025 04:23:28 +0000

What's a Docker Container?

Before we dive into setting up DeepSeek-R1, let me explain what a Docker container is. Imagine you have a toy that works perfectly on your birthday but gets broken if you move it to another room. A Docker container is like a magic box that keeps your AI model (the toy) in perfect condition wherever you take it, whether it's running as a background task, on a web server, or even in the cloud.

Docker containers encapsulate everything required to run an application: the code, dependencies, and environment settings. This ensures consistency across different machines, which is super important for AI models that rely on precise configurations.

Setting Up The Environment

Step 1: Install Ubuntu on Windows (If You Haven't Already)

If you're using Windows, the easiest way to get an Ubuntu environment is through the Microsoft Store. Here's how:

Open the Microsoft Store and search for Ubuntu.
Click Get and let it install.
Once installed, open Ubuntu from the Start menu and follow the setup instructions.
Update the system:

sudo apt update && sudo apt upgrade

Now, you have an Ubuntu terminal running on Windows!

Step 2: Install Docker (If You Haven't Already)

First, let's check if you have Docker installed. Open a terminal and run:

docker --version

If that returns a version number, congrats! If not, install Docker:

sudo apt update && sudo apt install docker.io -y
sudo systemctl enable --now docker

Step 3: Prerequisites for NVIDIA GPU

Install NVIDIA Container Toolkit:

Configuring the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the package list:

sudo apt-get update

Install the NVIDIA Container Toolkit:

sudo apt-get install -y nvidia-container-toolkit

Running Ollama Inside Docker

Run these commands(P.S. shoutout to NetworkChuck):

docker run -d \
--gpus all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--security-opt=no-new-privileges \
--cap-drop=ALL \
--cap-add=SYS_NICE \
--memory=8g \
--memory-swap=8g \
--cpus=4 \
--read-only \
--name ollama \
ollama/ollama

Running DeepSeek-R1 Locally

Time to bring DeepSeek-R1 to life locally and containerized:

docker exec -it ollama ollama run deepseek-r1

or you can run other versions of deepseek-r1 just by typing in the version at the end after a colon(:)

docker exec -it ollama ollama run deepseek-r1:7b

After this, play around with the AI, if you wanna exit just type:

/bye

Starting Deepseek-R1

To Start Deepseek-R1 from next time go to Ubuntu and type:

docker start ollama

this will start ollama docker container; then type:

docker exec -it ollama ollama run deepseek-r1:7b

Forem: Prajwol Adhikari

Part 2: Infrastructure as Code with Terraform, OIDC, and a GitOps Pipeline

Introduction

See it live

What is Infrastructure as Code and why should you care?

Chapter 1: The Module Structure

Chapter 2: Cloudflare DNS — 13 Records as Code

Chapter 3: Oracle Cloud — The Network and the Server

Chapter 4: Remote State — And Why It Matters More Than You Think

Chapter 5: The Pipeline — Plan on PR, Apply on Merge

Chapter 6: OIDC — The Part That Changed How I Think About Credentials

Chapter 7: Handling Secrets in the Pipeline

Chapter 8: Branch Protection — Closing the Loop

Chapter 9: The Migration That Made Me Nervous

What I Took Away From This

What is Next?

Part 1: Building a Security-Gated CI/CD Pipeline with GitHub Actions

Introduction

See it live

What is CI/CD and why does it matter?

Chapter 1: The Pipeline Architecture

Chapter 3: Security Gate 2 — CodeQL Static Analysis

Chapter 4: Security Gate 3 — Dependency Review

Chapter 5: The Containerized Build — And Where I Got Stuck

Chapter 6: The Lighthouse Audit

Chapter 7: Secretless Deployment with OIDC

Chapter 8: The Moment It All Worked

Branch Protection — Locking It In

The Complete Workflow File

What I Took Away From This

What is Next?

Appendix: The .lighthouserc.json

Building an LLM-Powered Log Triage Pipeline with Python and DeepSeek-R1

Introduction

Why not just use Alertmanager for everything?

Chapter 1: The Architecture

Chapter 2: Setting Up Ollama and DeepSeek-R1

Chapter 3: The Python Script — Rules First, LLM Second

Chapter 4: The Discord Integration

Chapter 5: The Cron Job

Chapter 6: What I Learned Building This

What Could Be Better

The Monitoring Stack So Far

Appendix: The Complete Script

What is Next

Part 5: Securing a Homelab with Cloudflare Tunnels and Zero Trust

Introduction

Chapter 1: The Domain Advantage

Chapter 2: Setting Up the Cloudflare Tunnel

Chapter 3: Deploying the cloudflared Connector

Chapter 4: Routing Traffic - Cloudflare to NPM

Chapter 5: The "Split-Brain DNS" Setup

Chapter 6: The "Split-Brain" SSL Fix (Local & Public)

Step 1: Get a Cloudflare API Token

Step 2: Configure NPM Proxy Host

Step 3: Configure Cloudflare SSL

Chapter 7: The Final Layer - Cloudflare Access (Zero Trust)

Conclusion: From Bare Metal to a Secure, Global Homelab

Looking Back: Our 5-Part Journey

What's Next?

Part 4: Automating a Homelab with Backups, Updates, and Alerts

Introduction

Chapter 1: The Automated Backup Strategy (at 3 AM)

Step 1: Configure rclone for Google Drive

Step 2: Create the Backup Script

Step 3: Automate with Cron

Chapter 2: Automated Updates with Watchtower (at 6 AM)

Step 1: The Docker Compose File

Chapter 3: Proactive Alerting (24/7)

Step 1: The Alerting Pipeline

Step 2: Deploy Alertmanager

Step 3: Configure Prometheus

Step 3: The Critical Firewall Fix

Conclusion

Part 3: A High-Availability DNS Network with AdGuard Home

Introduction

Chapter 1: The Local Workhorse (Primary DNS)

Step 1: Deploying AdGuard Home with Docker Compose

Step 2: Initial AdGuard Home Setup Wizard

Step 3: Configure My Home Router

Appendix: The `.lighthouserc.json`

Chapter 3: Deploying the `cloudflared` Connector

Step 1: Configure `rclone` for Google Drive