<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shankar</title>
    <description>The latest articles on Forem by Shankar (@shankar_t).</description>
    <link>https://forem.com/shankar_t</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1034432%2F631b4b74-d02d-4233-abc4-29fa2349f603.jpeg</url>
      <title>Forem: Shankar</title>
      <link>https://forem.com/shankar_t</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shankar_t"/>
    <language>en</language>
    <item>
      <title>Adding API Gateway to My Cloud Resume</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Sun, 12 Apr 2026 05:03:50 +0000</pubDate>
      <link>https://forem.com/shankar_t/adding-api-gateway-to-my-cloud-resume-3bbn</link>
      <guid>https://forem.com/shankar_t/adding-api-gateway-to-my-cloud-resume-3bbn</guid>
      <description>&lt;h1&gt;
  
  
  Five Failures in One Evening: Adding API Gateway to My Cloud Resume
&lt;/h1&gt;

&lt;p&gt;In my &lt;a href="https://medium.com/@tiwarishankart/how-i-built-a-serverless-resume-on-aws-using-terraform" rel="noopener noreferrer"&gt;previous article&lt;/a&gt;, I documented migrating my Cloud Resume from ClickOps to Terraform. The system worked: S3 + CloudFront for the frontend, a Lambda Function URL for the visitor counter, DynamoDB for persistence, and GitHub Actions for CI/CD.&lt;/p&gt;

&lt;p&gt;But the Lambda Function URL had a problem. It was a bare endpoint with no throttling, no API key, and no usage tracking. Anyone with the URL could call it a million times and I'd be paying for a million DynamoDB writes.&lt;/p&gt;

&lt;p&gt;I added API Gateway in front of the Lambda. Five things broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I added
&lt;/h2&gt;

&lt;p&gt;Three new Terraform modules:&lt;/p&gt;

&lt;p&gt;An &lt;code&gt;api-gateway&lt;/code&gt; module (&lt;code&gt;modules/api-gateway/&lt;/code&gt;) with a REST API exposing GET and POST on &lt;code&gt;/visitors&lt;/code&gt;, both requiring an API key. It includes a usage plan with rate limiting (5 requests/sec, burst of 10), a monthly quota of 10,000 requests, a MOCK integration for CORS preflight, and CloudWatch access logging on the prod stage.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;vpc&lt;/code&gt; module (&lt;code&gt;modules/vpc/&lt;/code&gt;) with a 10.0.0.0/16 network, two public subnets and two private subnets across us-east-1a and us-east-1b. I skipped the NAT Gateway because that's $32/month I don't need yet. This is prep for Phase 2 when I add containers or RDS.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;dns&lt;/code&gt; module (&lt;code&gt;modules/dns/&lt;/code&gt;) for ACM certificate and API Gateway custom domain mapping to &lt;code&gt;api.arlingtonhood21.work&lt;/code&gt;, gated behind a feature flag (&lt;code&gt;enable_custom_domain = false&lt;/code&gt;) since it requires manual DNS validation.&lt;/p&gt;

&lt;p&gt;The frontend JavaScript changed from calling the Lambda Function URL directly to calling the API Gateway with an &lt;code&gt;x-api-key&lt;/code&gt; header.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 1: Resources already exist
&lt;/h2&gt;

&lt;p&gt;I pushed everything to main. The backend CI ran &lt;code&gt;terraform apply&lt;/code&gt; and failed with four errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ResourceAlreadyExistsException: CloudWatch Logs log group /aws/apigateway/visitor-api already exists
EntityAlreadyExists: Role with name api-gateway-cloudwatch-role already exists
ResourceConflictException: The statement id (AllowAPIGatewayGetInvoke) provided already exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I had created these resources locally with &lt;code&gt;terraform apply&lt;/code&gt; before the CI existed. The CI's import step only covered root-level resources (DynamoDB, Lambda, SNS). The new API Gateway module resources weren't in the import list.&lt;/p&gt;

&lt;p&gt;The fix was adding six &lt;code&gt;terraform import&lt;/code&gt; commands for the module resources: the CloudWatch log group, IAM role, role policy attachment, both Lambda permissions, and the API Gateway account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 2: CI didn't trigger
&lt;/h2&gt;

&lt;p&gt;I pushed the import fix to &lt;code&gt;.github/workflows/backend-cicd.yml&lt;/code&gt;. Nothing happened.&lt;/p&gt;

&lt;p&gt;The workflow trigger only watched &lt;code&gt;resume-backend/**&lt;/code&gt;. The workflow file itself lives at &lt;code&gt;.github/workflows/backend-cicd.yml&lt;/code&gt;, outside that path. GitHub Actions path filters are literal glob matches. If the file you change doesn't match the path filter, the workflow doesn't run.&lt;/p&gt;

&lt;p&gt;I added the workflow file to its own trigger and threw in &lt;code&gt;workflow_dispatch&lt;/code&gt; for manual runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;resume-backend/**'&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.github/workflows/backend-cicd.yml'&lt;/span&gt;
&lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Failure 3: New API Gateway, old URL
&lt;/h2&gt;

&lt;p&gt;After the CI succeeded, the visitor counter showed "--" instead of a number.&lt;/p&gt;

&lt;p&gt;The import step had only imported resources with globally-unique identifiers (IAM roles, CloudWatch log groups, Lambda permissions). The REST API itself, its methods, integrations, stages, and deployment weren't imported because they don't have globally-unique names. Terraform created a brand new API Gateway with a different ID.&lt;/p&gt;

&lt;p&gt;My frontend was still pointing at the old URL. And Terraform had updated the Lambda permissions to reference the new API Gateway, so the old URL lost its ability to invoke the Lambda. Both URLs were broken.&lt;/p&gt;

&lt;p&gt;I pulled the new URL and API key from &lt;code&gt;terraform output&lt;/code&gt;, updated &lt;code&gt;index.js&lt;/code&gt;, pushed, and invalidated the CloudFront cache.&lt;/p&gt;

&lt;p&gt;Still "--".&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 4: CORS preflight mismatch
&lt;/h2&gt;

&lt;p&gt;I tested with curl and got &lt;code&gt;{"count": 162}&lt;/code&gt; back. The API worked. The browser was blocking it.&lt;/p&gt;

&lt;p&gt;I sent an OPTIONS request mimicking the browser's preflight check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-D-&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; OPTIONS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://pb5rav4teh.execute-api.us-east-1.amazonaws.com/prod/visitors"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Origin: https://shankar-resume.arlingtonhood21.work"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Access-Control-Request-Method: POST"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Access-Control-Allow-Origin: https://arlingtonhood21.work
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My site loads from &lt;code&gt;https://shankar-resume.arlingtonhood21.work&lt;/code&gt;. The browser does an exact string match. &lt;code&gt;arlingtonhood21.work&lt;/code&gt; does not equal &lt;code&gt;shankar-resume.arlingtonhood21.work&lt;/code&gt;. Preflight rejected, POST blocked.&lt;/p&gt;

&lt;p&gt;The problem was in how API Gateway handles CORS. The actual POST request flows through the Lambda proxy integration, where my Python code checks the &lt;code&gt;Origin&lt;/code&gt; header dynamically and returns the matching origin. But the OPTIONS preflight never reaches the Lambda. It hits a MOCK integration that returns a hardcoded, static value. I had set that static value to the apex domain instead of the subdomain.&lt;/p&gt;

&lt;p&gt;Curl doesn't send preflight requests, which is why it worked from the terminal. Browsers always send OPTIONS first for cross-origin POST requests with custom headers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 5: The state kept disappearing
&lt;/h2&gt;

&lt;p&gt;I fixed the CORS config, pushed, and the CI created a &lt;em&gt;third&lt;/em&gt; API Gateway. &lt;code&gt;Apply complete! Resources: 35 added, 4 changed, 4 destroyed.&lt;/code&gt; 35 new resources for a one-line change. Something was destroying the Terraform state between runs.&lt;/p&gt;

&lt;p&gt;I checked S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://shankar-resume-2025/ &lt;span class="nt"&gt;--recursive&lt;/span&gt;
2026-04-11  404.html
2026-04-11  favicon.svg
2026-04-11  index.html
2026-04-11  index.js
2026-04-11  style.css
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;resume-backend/terraform.tfstate&lt;/code&gt;. Gone.&lt;/p&gt;

&lt;p&gt;My Terraform backend stores state in the same S3 bucket that hosts the frontend. The frontend CI pipeline runs &lt;code&gt;aws s3 sync . s3://bucket --delete&lt;/code&gt; on every push. That &lt;code&gt;--delete&lt;/code&gt; flag removes any S3 object not present in the source directory. The source directory has five HTML/CSS/JS files. It does not have &lt;code&gt;resume-backend/terraform.tfstate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Every frontend deploy deleted the Terraform state. Every backend CI run started from zero, imported a subset of resources, and created everything else from scratch. That's why the API Gateway URL kept changing.&lt;/p&gt;

&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;aws s3 sync . s3://${{ secrets.AWS_S3_BUCKET_NAME }} --delete --exclude "resume-backend/*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding the exclude flag, I triggered one final backend CI run, updated the frontend with the fourth and final API Gateway URL, and confirmed the state file survived the next frontend deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd change next time
&lt;/h2&gt;

&lt;p&gt;Terraform state and application assets should not share a bucket. I stored infrastructure state in the same S3 bucket as the website. One &lt;code&gt;--delete&lt;/code&gt; flag on a sync command was all it took to wipe the state on every deploy. If I were starting over, the state bucket would be its own resource with no other purpose.&lt;/p&gt;

&lt;p&gt;CORS on API Gateway has two paths, and they don't share configuration. The Lambda handles CORS dynamically for actual requests. The MOCK integration returns static headers for preflight. If those two don't agree on the allowed origin, the browser blocks everything. Curl won't catch this because it skips preflight entirely. I should have tested with browser DevTools instead of curl. A 200 from curl tells you nothing about whether a browser can reach your API.&lt;/p&gt;

&lt;p&gt;The import-on-every-run pattern is a workaround, not a design. It exists because I deployed manually before CI existed. If the first deploy had gone through CI, I would never have needed imports. For new projects: set up CI first, then deploy through it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current state
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser (shankar-resume.arlingtonhood21.work)
  |
  +-- Static files: CloudFront -&amp;gt; S3
  |
  +-- Visitor API: POST /prod/visitors
        |
        API Gateway (EDGE, api-key-required)
          Rate limit: 5 req/sec, burst 10
          Monthly quota: 10,000
          CloudWatch access logging
          |
          Lambda (Python 3.9) -&amp;gt; DynamoDB

Kill Switch:
  AWS Budget ($2/mo) -&amp;gt; SNS -&amp;gt; Lambda -&amp;gt; disables CloudFront
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four API Gateways were created and destroyed in the process. The fifth one stuck.&lt;/p&gt;

&lt;p&gt;Live site: &lt;a href="https://shankar-resume.arlingtonhood21.work" rel="noopener noreferrer"&gt;shankar-resume.arlingtonhood21.work&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>apigateway</category>
      <category>terraform</category>
      <category>devops</category>
    </item>
    <item>
      <title>My K3s Pi Cluster Died After a Reboot: A Troubleshooting Story</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Thu, 30 Oct 2025 17:11:36 +0000</pubDate>
      <link>https://forem.com/shankar_t/my-k3s-pi-cluster-died-after-a-reboot-a-troubleshooting-war-story-m93</link>
      <guid>https://forem.com/shankar_t/my-k3s-pi-cluster-died-after-a-reboot-a-troubleshooting-war-story-m93</guid>
      <description>&lt;p&gt;I have a Raspberry Pi homelab running k3s, all managed perfectly with FluxCD and SOPS for secrets. It was stable for weeks.&lt;/p&gt;

&lt;p&gt;Then, I had to reboot my router.&lt;/p&gt;

&lt;p&gt;When it came back up, my Pi was assigned a new IP address (it went from &lt;code&gt;192.168.1.9&lt;/code&gt; to &lt;code&gt;192.168.1.10&lt;/code&gt;). Suddenly, my entire cluster was gone.&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;kubectl get nodes&lt;/code&gt; from my laptop gave me the dreaded:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The connection to the server 192.168.1.9:6443 was refused...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ah, I thought. "Easy fix." I updated my &lt;code&gt;~/.kube/config&lt;/code&gt; to point to the new IP, &lt;code&gt;192.168.1.10&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I ran &lt;code&gt;kubectl get nodes&lt;/code&gt; again... and got the &lt;em&gt;same error&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The connection to the server 192.168.1.10:6443 was refused...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This meant the &lt;code&gt;k3s&lt;/code&gt; service itself wasn't running on the Pi. This post is the story of the troubleshooting journey that followed, and the &lt;em&gt;three&lt;/em&gt; major "fixes" it took to get it all back online.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: The Crash Loop (On the Pi)
&lt;/h2&gt;

&lt;p&gt;I SSH'd into the Pi to see what was wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh shankarpi@192.168.1.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, I checked the service status. This is the #1 thing to do.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status k3s.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The service was in a permanent crash loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● k3s.service - Lightweight Kubernetes
     Active: activating &lt;span class="o"&gt;(&lt;/span&gt;auto-restart&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;Result: exit-code&lt;span class="o"&gt;)&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means k3s is starting, failing, and systemd is trying to restart it over and over. Time to check the logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; k3s.service &lt;span class="nt"&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there it was. The first smoking gun:&lt;/p&gt;

&lt;p&gt;level=fatal msg="Failed to start networking: unable to initialize network policy controller: error getting node subnet: failed to find interface with specified node ip"&lt;/p&gt;

&lt;p&gt;This was a double-whammy:&lt;/p&gt;

&lt;p&gt;K3s was starting before the wlan0 (Wi-Fi) interface had time to connect and get its 192.168.1.10 IP. This is a classic race condition on reboot.&lt;/p&gt;

&lt;p&gt;K3s was still configured internally to use the old IP.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: The Service File Fixes
&lt;/h2&gt;

&lt;p&gt;The fix was to edit the systemd service file to (1) wait for the network and (2) force k3s to use the new IP for everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/systemd/system/k3s.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I made four critical changes to the [Service] section:&lt;/p&gt;

&lt;p&gt;Added ExecStartPre: This line forces systemd to wait until the wlan0 interface actually has the IP address 192.168.1.10 before trying to start k3s.&lt;/p&gt;

&lt;p&gt;Added --node-ip: Tells k3s what IP to use internally.&lt;/p&gt;

&lt;p&gt;Added --node-external-ip: Tells k3s what IP to advertise externally (this was the fix for the IP conflict).&lt;/p&gt;

&lt;p&gt;Added --flannel-iface: Tells the flannel CNI which network interface to use.&lt;/p&gt;

&lt;p&gt;The ExecStart block now looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;Service]
...
&lt;span class="nv"&gt;Restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;always
&lt;span class="nv"&gt;RestartSec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5s

&lt;span class="c"&gt;# FIX 1: Wait for the wlan0 interface to have the correct IP&lt;/span&gt;
&lt;span class="nv"&gt;ExecStartPre&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'while ! ip addr show wlan0 | grep -q "inet 192.168.1.10"; do sleep 1; done'&lt;/span&gt;

&lt;span class="c"&gt;# FIX 2: Hard-code the new IP and interface for k3s&lt;/span&gt;
&lt;span class="nv"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/local/bin/k3s &lt;span class="se"&gt;\&lt;/span&gt;
    server &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--node-ip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;192.168.1.10 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--node-external-ip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;192.168.1.10 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--flannel-iface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;wlan0
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I reloaded systemd and restarted the service, full of confidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart k3s.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...and it still went into a crash loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: The "Aha!" Moment (The Corrupted Database)
&lt;/h2&gt;

&lt;p&gt;I was stumped. The service file was perfect. The IP was correct. The Pi was waiting for the network. Why was it still crashing?&lt;/p&gt;

&lt;p&gt;I watched the logs again (sudo journalctl -u k3s.service -f) and saw something I'd missed.&lt;/p&gt;

&lt;p&gt;The service would start, run for about 15 seconds... and then crash. In that 15-second window, I saw this:&lt;/p&gt;

&lt;p&gt;I1030 20:37:56 ... "Successfully retrieved node IP(s)" IPs=["192.168.1.9"]&lt;/p&gt;

&lt;p&gt;It was still finding the old IP!&lt;/p&gt;

&lt;p&gt;This was the "Aha!" moment. The k3s.service config flags were correct, but k3s was loading its old database, which was still full of references to the old .9 IP (like for the Traefik load balancer). It was seeing a conflict between its new config (.10) and its old database (.9), and crashing.&lt;/p&gt;

&lt;p&gt;The database was corrupted with stale data.&lt;/p&gt;

&lt;p&gt;The Real Fix: Nuke the database and let k3s rebuild it from scratch using the new, correct config.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;

&lt;span class="c"&gt;# 1. Stop k3s&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop k3s.service

&lt;span class="c"&gt;# 2. Delete the old, corrupted database&lt;/span&gt;
&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/rancher/k3s/server/db/

&lt;span class="c"&gt;# 3. Start k3s&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start k3s.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I checked the status one more time, and...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● k3s.service - Lightweight Kubernetes
     Active: active &lt;span class="o"&gt;(&lt;/span&gt;running&lt;span class="o"&gt;)&lt;/span&gt; since Thu 2025-10-30 20:55:03 IST&lt;span class="p"&gt;;&lt;/span&gt; 1min 9s ago
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It was stable. It worked. The cluster was back.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: The GitOps Restoration (Flux is Gone!)
&lt;/h2&gt;

&lt;p&gt;I went back to my Mac. kubectl get nodes worked!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME        STATUS   ROLES                  AGE   VERSION
shankarpi   Ready    control-plane,master   1m    v1.33.5+k3s1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But when I ran flux get kustomizations, I got a new error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;✗ unable to retrieve the &lt;span class="nb"&gt;complete &lt;/span&gt;list of server APIs: kustomize.toolkit.fluxcd.io/v1: no matches &lt;span class="k"&gt;for&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course. When I deleted the database, I deleted everything—including the FluxCD installation and all its API definitions (CRDs).&lt;/p&gt;

&lt;p&gt;The cluster was healthy, but empty.&lt;/p&gt;

&lt;p&gt;Luckily, with a GitOps setup, this is the easiest fix in the world. I just had to re-bootstrap Flux.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On my Mac&lt;/span&gt;

&lt;span class="c"&gt;# 1. Set my GitHub Token&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ghp_..."&lt;/span&gt;

&lt;span class="c"&gt;# 2. Re-run the bootstrap command&lt;/span&gt;
flux bootstrap github &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tiwari91 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;pi-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./clusters/staging &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--personal&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This re-installed Flux, and it immediately started trying to deploy my apps.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: The Final "Gotcha" (The Missing SOPS Secret)
&lt;/h2&gt;

&lt;p&gt;I was so close. I ran flux get kustomizations one last time. This is what I saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME                READY   MESSAGE
apps                False   decryption failed &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="s1"&gt;'tunnel-credentials'&lt;/span&gt;: ...
flux-system         True    Applied revision: main@sha1:784af83f
infra...            False   decryption failed &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="s1"&gt;'renovate-container-env'&lt;/span&gt;: ...
monitoring-configs  False   decryption failed &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="s1"&gt;'grafana-tls-secret'&lt;/span&gt;: ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My flux-system was running, but all my other apps were failing with decryption failed. Why?&lt;/p&gt;

&lt;p&gt;When I reset the cluster, I also deleted the sops-age secret that Flux uses to decrypt my files.&lt;/p&gt;

&lt;p&gt;The solution was to put that secret back.&lt;/p&gt;

&lt;p&gt;On my Mac, I deleted the (possibly stale) secret just in case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete secret sops-age &lt;span class="nt"&gt;-n&lt;/span&gt; flux-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I re-created the secret from my local private key file. (Mine was named age.agekey)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;age.agekey | kubectl create secret generic sops-age &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;flux-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;age.agekey&lt;span class="o"&gt;=&lt;/span&gt;/dev/stdin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I told Flux to try one last time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flux reconcile kustomization apps &lt;span class="nt"&gt;--with-source&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Success! Flux found the key, decrypted the manifests, and all my namespaces and pods (linkding, audiobookshelf, monitoring) started spinning up.&lt;/p&gt;

&lt;p&gt;TL;DR: The 3-Step Fix for a Dead k3s Pi&lt;br&gt;
If your k3s Pi cluster dies after an IP change:&lt;/p&gt;

&lt;p&gt;Fix k3s.service: SSH into the Pi. Edit /etc/systemd/system/k3s.service to add the ExecStartPre line to wait for your network and add the --node-ip, --node-external-ip, and --flannel-iface flags with your new static IP.&lt;/p&gt;

&lt;p&gt;Reset the Database: The old IP is still in the database. Stop k3s, delete the DB, and restart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop k3s.service
&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/rancher/k3s/server/db/
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start k3s.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restore GitOps: Your cluster is now empty.&lt;/p&gt;

&lt;p&gt;Run flux bootstrap ... again to re-install Flux.&lt;/p&gt;

&lt;p&gt;Re-create your sops-age secret: cat age.agekey | kubectl create secret generic sops-age -n flux-system ...&lt;/p&gt;

&lt;p&gt;Force a reconcile: flux reconcile kustomization apps --with-source&lt;/p&gt;

&lt;p&gt;And just like that, my cluster was back from the dead.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>raspberrypi</category>
      <category>gitops</category>
      <category>flux</category>
    </item>
    <item>
      <title>From Zero to GitOps: Building a k3s Homelab on a Raspberry Pi with Flux &amp; SOPS</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Thu, 30 Oct 2025 11:14:12 +0000</pubDate>
      <link>https://forem.com/shankar_t/from-zero-to-gitops-building-a-k3s-homelab-on-a-raspberry-pi-with-flux-sops-55b7</link>
      <guid>https://forem.com/shankar_t/from-zero-to-gitops-building-a-k3s-homelab-on-a-raspberry-pi-with-flux-sops-55b7</guid>
      <description>&lt;p&gt;This post documents the end-to-end process for setting up a &lt;code&gt;k3s&lt;/code&gt; Kubernetes cluster on a Raspberry Pi, managing it remotely from a Mac, and deploying applications securely using GitOps with FluxCD and SOPS encryption. We'll cover everything from OS install to deploying encrypted secrets and tackling common troubleshooting hurdles.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Initial Pi Setup &amp;amp; OS Installation
&lt;/h2&gt;

&lt;p&gt;This phase covers preparing the Raspberry Pi hardware and operating system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Install OS:&lt;/strong&gt; Use the &lt;strong&gt;Raspberry Pi Imager&lt;/strong&gt; to write &lt;strong&gt;Raspberry Pi OS (64-BIT)&lt;/strong&gt; to an SD card.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;OS Configuration:&lt;/strong&gt; In the Imager's advanced settings, pre-configure:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hostname:&lt;/strong&gt; &lt;code&gt;k3s-node&lt;/code&gt; (or your preferred name)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Username and Password:&lt;/strong&gt; e.g., &lt;code&gt;pi-admin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wireless LAN:&lt;/strong&gt; Your Wi-Fi SSID and password.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Set a Static IP:&lt;/strong&gt; To ensure a stable connection, set a &lt;strong&gt;DHCP Reservation&lt;/strong&gt; for the Pi in your home router's settings, linking the Pi's MAC address to a specific IP (e.g., &lt;code&gt;192.168.1.100&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  2. Kubernetes (k3s) Installation
&lt;/h2&gt;

&lt;p&gt;We installed &lt;code&gt;k3s&lt;/code&gt;, a lightweight Kubernetes distribution, directly onto the Pi's operating system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Enable Cgroups (Critical Fix):&lt;/strong&gt; The &lt;code&gt;k3s&lt;/code&gt; service will crash on startup without this Linux kernel feature.

&lt;ul&gt;
&lt;li&gt;SSH into the Pi: &lt;code&gt;ssh pi-admin@k3s-node.local&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Edit the boot config file: &lt;code&gt;sudo nano /boot/firmware/cmdline.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;cgroup_memory=1 cgroup_enable=memory&lt;/code&gt; to the end of the single line in the file.&lt;/li&gt;
&lt;li&gt;Save (&lt;code&gt;Ctrl+X&lt;/code&gt;, &lt;code&gt;Y&lt;/code&gt;, &lt;code&gt;Enter&lt;/code&gt;) and reboot: &lt;code&gt;sudo reboot&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Install k3s:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;
curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;https://get.k3s.io]&lt;span class="o"&gt;(&lt;/span&gt;https://get.k3s.io&lt;span class="o"&gt;)&lt;/span&gt; | sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Verify Service:&lt;/strong&gt; Ensure the &lt;code&gt;k3s&lt;/code&gt; service is stable and running.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status k3s.service
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;The output must show &lt;code&gt;Active: active (running)&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  3. Remote Management from macOS
&lt;/h2&gt;

&lt;p&gt;To manage the Pi's cluster from your Mac, you need to copy its configuration.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Update Kubeconfig File:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;On the Pi, copy the config: &lt;code&gt;sudo cat /etc/rancher/k3s/k3s.yaml &amp;gt; k3s_config.yaml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Edit the file (&lt;code&gt;nano k3s_config.yaml&lt;/code&gt;) and change the &lt;code&gt;server&lt;/code&gt; address from &lt;code&gt;https://127.0.0.1:6443&lt;/code&gt; to the Pi's static IP (e.g., &lt;code&gt;https://192.168.1.100:6443&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Copy to Mac:&lt;/strong&gt; From your Mac's terminal, copy the file to your local kubeconfig location. &lt;strong&gt;Warning:&lt;/strong&gt; This overwrites your default config. If you manage multiple clusters, merge this file's contents manually.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scp pi-admin@k3s-node.local:~/k3s_config.yaml ~/.kube/config
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Test Connection:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;You should see your Pi node (&lt;code&gt;k3s-node&lt;/code&gt;) listed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  4. GitOps Setup with Flux &amp;amp; SOPS
&lt;/h2&gt;

&lt;p&gt;This phase automates deployments and configures secret encryption.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bootstrap Flux:&lt;/strong&gt; Install Flux on the cluster and configure it to watch your Git repository.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your Mac&lt;/span&gt;
&lt;span class="c"&gt;# (Ensure GITHUB_USER is set in your env)&lt;/span&gt;
flux bootstrap github &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$GITHUB_USER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;pi-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./clusters/staging &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--personal&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generate &lt;code&gt;age&lt;/code&gt; Keypair:&lt;/strong&gt; Create a new keypair for encryption.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your Mac&lt;/span&gt;
age-keygen &lt;span class="nt"&gt;-o&lt;/span&gt; age.agekey
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;This creates &lt;code&gt;age.agekey&lt;/code&gt; (your private key) and shows your public key (starts &lt;code&gt;age1...&lt;/code&gt;). Keep the private key safe!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Add Private Key to Cluster:&lt;/strong&gt; Create a Kubernetes secret in the &lt;code&gt;flux-system&lt;/code&gt; namespace containing your private key. This allows Flux's controllers to decrypt files.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your Mac&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;age.agekey | kubectl create secret generic sops-age &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;flux-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;age.agekey&lt;span class="o"&gt;=&lt;/span&gt;/dev/stdin
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configure SOPS Rules:&lt;/strong&gt; Create a &lt;code&gt;.sops.yaml&lt;/code&gt; file in &lt;code&gt;clusters/staging/&lt;/code&gt; to tell SOPS which public key to use for encrypting files.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In clusters/staging/.sops.yaml&lt;/span&gt;
&lt;span class="na"&gt;creation_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.*.yaml&lt;/span&gt;
    &lt;span class="na"&gt;encrypted_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;^(data|stringData)$&lt;/span&gt;
    &lt;span class="na"&gt;age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;PASTE_YOUR_PUBLIC_AGE_KEY_HERE&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configure Flux for Decryption:&lt;/strong&gt; Edit &lt;code&gt;clusters/staging/flux-system/kustomization.yaml&lt;/code&gt; to tell Flux to use the &lt;code&gt;sops-age&lt;/code&gt; secret.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In clusters/staging/flux-system/kustomization.yaml&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="na"&gt;decryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sops&lt;/span&gt;
    &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sops-age&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Commit &amp;amp; Push&lt;/strong&gt; your new &lt;code&gt;.sops.yaml&lt;/code&gt; and modified &lt;code&gt;kustomization.yaml&lt;/code&gt; to Git.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. Deploying Encrypted Secrets via GitOps
&lt;/h2&gt;

&lt;p&gt;This is the process of creating encrypted secret files and adding them to your Git repository for Flux to deploy.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. : Deploy the Cloudflare Secret
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generate Secret YAML:&lt;/strong&gt; Create a YAML manifest from your Cloudflare credential file (&lt;code&gt;&amp;lt;tunnel_id&amp;gt;.json&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create secret generic tunnel-credentials &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;credentials.json&lt;span class="o"&gt;=&lt;/span&gt;./&amp;lt;tunnel_id&amp;gt;.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; &amp;lt;your-app-namespace&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dry-run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;client &lt;span class="nt"&gt;-o&lt;/span&gt; yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; cloudflare-secret.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Encrypt the File:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sops &lt;span class="nt"&gt;--config&lt;/span&gt; clusters/staging/.sops.yaml &lt;span class="nt"&gt;--encrypt&lt;/span&gt; &lt;span class="nt"&gt;--in-place&lt;/span&gt; cloudflare-secret.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Move and Rename:&lt;/strong&gt; Move the encrypted secret into your application's directory (e.g., &lt;code&gt;apps/base/linkding/secret-cloudflare.sops.yaml&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2: Deploy the Linkding Superuser Secret
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generate Secret YAML:&lt;/strong&gt; Create a secret with the environment variables &lt;code&gt;linkding&lt;/code&gt; expects.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create secret generic linkding-superuser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;LD_SUPERUSER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-user &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;LD_SUPERUSER_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YourSecurePassword &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; &amp;lt;your-app-namespace&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dry-run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;client &lt;span class="nt"&gt;-o&lt;/span&gt; yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; secret-superuser.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Encrypt and Move:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sops &lt;span class="nt"&gt;--config&lt;/span&gt; clusters/staging/.sops.yaml &lt;span class="nt"&gt;--encrypt&lt;/span&gt; &lt;span class="nt"&gt;--in-place&lt;/span&gt; secret-superuser.yaml
&lt;span class="nb"&gt;mv &lt;/span&gt;secret-superuser.yaml apps/base/linkding/secret-superuser.sops.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Update &lt;code&gt;kustomization.yaml&lt;/code&gt;:&lt;/strong&gt; Edit &lt;code&gt;apps/base/linkding/kustomization.yaml&lt;/code&gt; to tell Flux to deploy these new secret files.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;namespace.yaml&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;deployment.yaml&lt;/span&gt;
  &lt;span class="c1"&gt;# ... other resources&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;secret-cloudflare.sops.yaml&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;secret-superuser.sops.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Update &lt;code&gt;deployment.yaml&lt;/code&gt;:&lt;/strong&gt; Modify &lt;code&gt;apps/base/linkding/deployment.yaml&lt;/code&gt; to use the superuser secret.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In deployment.yaml, inside the container spec:&lt;/span&gt;
&lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linkding-superuser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Final Step: Commit and Reconcile
&lt;/h3&gt;

&lt;p&gt;After adding the files and updating the Kustomizations, commit everything to Git. Flux will automatically sync the changes, decrypt the secrets, and deploy them to your cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add &lt;span class="nb"&gt;.&lt;/span&gt;
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Add encrypted secrets for Cloudflare and Linkding"&lt;/span&gt;
git push origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. Troubleshooting Common Issues
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;ImagePullBackOff&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Kubernetes can't download the container image. You see this status when you run &lt;code&gt;kubectl get pods&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cause 1: Wrong Architecture.&lt;/strong&gt; You're trying to run an &lt;code&gt;amd64&lt;/code&gt; (standard PC/server) image on your &lt;code&gt;arm64&lt;/code&gt; Raspberry Pi.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Solution 1:&lt;/strong&gt; Find a multi-arch image or an &lt;code&gt;arm64&lt;/code&gt;/&lt;code&gt;aarch64&lt;/code&gt; specific version. Look for tags like &lt;code&gt;-arm64&lt;/code&gt;, &lt;code&gt;-aarch64&lt;/code&gt;, or check image descriptions on Docker Hub/GHCR. &lt;code&gt;lscr.io&lt;/code&gt; (LinuxServer.io) often provides good multi-arch images. Update the &lt;code&gt;image:&lt;/code&gt; tag in your &lt;code&gt;deployment.yaml&lt;/code&gt; and &lt;code&gt;git push&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cause 2: Pi Network/DNS Issues.&lt;/strong&gt; The Pi itself can't reach the container registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution 2:&lt;/strong&gt; SSH into the Pi.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Test basic connectivity: &lt;code&gt;ping google.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; Test DNS: &lt;code&gt;nslookup ghcr.io&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; If DNS fails, try setting static DNS servers: edit &lt;code&gt;/etc/dhcpcd.conf&lt;/code&gt; (&lt;code&gt;sudo nano /etc/dhcpcd.conf&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; Add a line: &lt;code&gt;static domain_name_servers=8.8.8.8 1.1.1.1&lt;/code&gt; (Google/Cloudflare DNS).&lt;/li&gt;
&lt;li&gt; Save and reboot (&lt;code&gt;sudo reboot&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Pod Stuck in &lt;code&gt;Pending&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The pod stays in the &lt;code&gt;Pending&lt;/code&gt; state and never starts. Running &lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt;&lt;/code&gt; shows an event like &lt;code&gt;failed to bind volume&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cause:&lt;/strong&gt; The pod is waiting for a &lt;code&gt;PersistentVolumeClaim&lt;/code&gt; (PVC), but no suitable &lt;code&gt;PersistentVolume&lt;/code&gt; (PV) is available to fulfill it (e.g., wrong size, access mode, storage class, or no PVs exist).&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution (Simple &lt;code&gt;hostPath&lt;/code&gt; PV for Homelab):&lt;/strong&gt; Define a &lt;code&gt;PersistentVolume&lt;/code&gt; in your GitOps repo that uses a directory on the Pi's filesystem. &lt;strong&gt;Warning:&lt;/strong&gt; &lt;code&gt;hostPath&lt;/code&gt; ties the data to that specific Pi node.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create a directory on the Pi:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /mnt/data/my-app-data &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo chown &lt;/span&gt;nobody:nogroup /mnt/data/my-app-data
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2.  Create a pv.yaml manifest in your GitOps repo (e.g., alongside the PVC):
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ```bash
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: my-app-data-pv # Unique name
    spec:
      capacity:
        storage: 5Gi # Must be &amp;gt;= PVC request
      volumeMode: Filesystem
      accessModes:
        - ReadWriteOnce # Must match PVC
      persistentVolumeReclaimPolicy: Retain # Keep data if PV is deleted
      storageClassName: manual # Give it a name
      hostPath:
        path: "/mnt/data/my-app-data" # Path on the Pi node
    ```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.  Add pv.yaml to your Kustomization.
4.  Update your application's **PVC** `spec.storageClassName` to `manual` (or whatever name you chose) so it binds to this PV.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;code&gt;Connection Refused&lt;/code&gt; / &lt;code&gt;ServiceUnavailable&lt;/code&gt; (from remote &lt;code&gt;kubectl&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Running &lt;code&gt;kubectl get nodes&lt;/code&gt; from your Mac or remote machine fails with &lt;code&gt;ServiceUnavailable&lt;/code&gt; or &lt;code&gt;connection refused&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cause:&lt;/strong&gt; The &lt;code&gt;k3s&lt;/code&gt; service on the Pi is down, restarting, or unstable. This is &lt;em&gt;almost always&lt;/em&gt; caused by:

&lt;ol&gt;
&lt;li&gt; Forgetting the &lt;strong&gt;&lt;code&gt;cgroups&lt;/code&gt; fix&lt;/strong&gt; (critical for &lt;code&gt;k3s&lt;/code&gt; on Raspberry Pi OS).&lt;/li&gt;
&lt;li&gt; The Pi is out of resources (memory/CPU).&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt;

&lt;ol&gt;
&lt;li&gt; SSH into the Pi.&lt;/li&gt;
&lt;li&gt; Check the &lt;code&gt;k3s&lt;/code&gt; service status: &lt;code&gt;sudo systemctl status k3s.service&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; If it's not &lt;code&gt;active (running)&lt;/code&gt;, check the logs for crash reasons: &lt;code&gt;sudo journalctl -u k3s.service -f&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Confirm &lt;code&gt;/boot/firmware/cmdline.txt&lt;/code&gt; includes the &lt;code&gt;cgroup_memory=1 cgroup_enable=memory&lt;/code&gt; flags (and reboot if you had to add them).&lt;/li&gt;
&lt;li&gt; Check resource usage with &lt;code&gt;htop&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>raspberrypi</category>
      <category>k3s</category>
    </item>
    <item>
      <title>Troubleshooting k3s on Raspberry Pi (Fixing the Auto-Restart Crash Loop)</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Thu, 30 Oct 2025 11:05:56 +0000</pubDate>
      <link>https://forem.com/shankar_t/troubleshooting-k3s-on-raspberry-pi-fixing-the-auto-restart-crash-loop-jd3</link>
      <guid>https://forem.com/shankar_t/troubleshooting-k3s-on-raspberry-pi-fixing-the-auto-restart-crash-loop-jd3</guid>
      <description>&lt;p&gt;So you've installed &lt;code&gt;k3s&lt;/code&gt; (the lightweight Kubernetes distribution) on your Raspberry Pi, but your &lt;code&gt;kubectl&lt;/code&gt; commands from your main computer are failing with "connection refused"? You SSH into the Pi, check the service status (&lt;code&gt;sudo systemctl status k3s.service&lt;/code&gt;), and see it stuck in &lt;code&gt;activating (auto-restart)&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;You're likely facing a very common issue, especially on Raspberry Pi OS. Let's diagnose it and fix it step-by-step.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Symptoms
&lt;/h2&gt;

&lt;p&gt;You'll typically see two related problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;On your Raspberry Pi:&lt;/strong&gt; The &lt;code&gt;k3s&lt;/code&gt; Kubernetes service itself is crashing and getting stuck in a restart loop.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sudo systemctl status k3s.service&lt;/code&gt; shows &lt;code&gt;Active: activating (auto-restart)&lt;/code&gt; or mentions &lt;code&gt;code=exited, status=1/FAILURE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Running &lt;code&gt;kubectl get nodes&lt;/code&gt; &lt;em&gt;on the Pi&lt;/em&gt; (with &lt;code&gt;sudo k3s kubectl&lt;/code&gt;) might intermittently work or show &lt;code&gt;The connection to the server 127.0.0.1:6443 was refused&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;On your Remote Machine (e.g., Mac/Linux/Windows):&lt;/strong&gt; Your &lt;code&gt;kubectl&lt;/code&gt; commands targeting the Pi fail consistently with connection errors (like &lt;code&gt;connection refused&lt;/code&gt; or timeouts) because the &lt;code&gt;k3s&lt;/code&gt; API server on the Pi isn't reliably available.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core issue we need to solve is &lt;strong&gt;on the Raspberry Pi&lt;/strong&gt;: why is &lt;code&gt;k3s&lt;/code&gt; crashing?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: &lt;code&gt;k3s&lt;/code&gt; is Crashing
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;systemctl status k3s.service&lt;/code&gt; output tells the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Active: activating (auto-restart)&lt;/code&gt;&lt;/strong&gt;: The service manager (&lt;code&gt;systemd&lt;/code&gt;) is trying to start &lt;code&gt;k3s&lt;/code&gt;, it fails, and &lt;code&gt;systemd&lt;/code&gt; automatically tries again, repeatedly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;(code=exited, status=1/FAILURE)&lt;/code&gt;&lt;/strong&gt;: This confirms the main &lt;code&gt;k3s&lt;/code&gt; process crashed with an error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;connection refused&lt;/code&gt; errors happen because &lt;code&gt;kubectl&lt;/code&gt; tries to talk to the &lt;code&gt;k3s&lt;/code&gt; API server while it's down during one of these crashes or restarts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Most Likely Cause on Raspberry Pi
&lt;/h2&gt;

&lt;p&gt;For Raspberry Pi setups, this crash-restart loop is almost &lt;em&gt;always&lt;/em&gt; due to &lt;strong&gt;missing Linux kernel features required by &lt;code&gt;k3s&lt;/code&gt;&lt;/strong&gt;, specifically related to &lt;strong&gt;Control Groups (cgroups)&lt;/strong&gt; for memory management. The &lt;code&gt;k3s&lt;/code&gt; installation script often warns about this, but it's an easy step to miss.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: Enable Cgroups and Reinstall k3s
&lt;/h2&gt;

&lt;p&gt;We'll ensure the kernel is configured correctly and then perform a clean re-installation of &lt;code&gt;k3s&lt;/code&gt; to fix any potentially corrupted state from the failed starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Enable Cgroups on the Raspberry Pi
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;SSH into your Raspberry Pi using its hostname or IP address:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &amp;lt;your_pi_user&amp;gt;@&amp;lt;your_pi_hostname_or_ip&amp;gt;
&lt;span class="c"&gt;# e.g., ssh pi-admin@k3s-node.local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Edit the boot configuration file using &lt;code&gt;sudo&lt;/code&gt; permissions:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /boot/firmware/cmdline.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This file contains a &lt;strong&gt;single, long line&lt;/strong&gt; of text. Use your arrow keys to navigate to the &lt;strong&gt;very end&lt;/strong&gt; of that line.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a single &lt;strong&gt;space&lt;/strong&gt;, and then paste the following text:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cgroup_memory=1 cgroup_enable=memory
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;em&gt;(Make absolutely sure the entire file content remains on one single line!)&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Save the file by pressing &lt;code&gt;Ctrl + X&lt;/code&gt;, then &lt;code&gt;Y&lt;/code&gt;, then &lt;code&gt;Enter&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Cleanly Uninstall k3s
&lt;/h3&gt;

&lt;p&gt;Let's remove the current (likely broken) installation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Run the official uninstall script (this stops the service and removes &lt;code&gt;k3s&lt;/code&gt; files):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run this command if it exists&lt;/span&gt;
/usr/local/bin/k3s-uninstall.sh

&lt;span class="c"&gt;# If the above gives "command not found", try this one:&lt;/span&gt;
&lt;span class="c"&gt;# /usr/local/bin/k3s-agent-uninstall.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Now, &lt;strong&gt;reboot&lt;/strong&gt; the Pi to apply the kernel changes from Step 1:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  3. Reinstall &lt;code&gt;k3s&lt;/code&gt; and Verify
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Wait for the Pi to reboot, then SSH back in.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run the installation script again:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;https://get.k3s.io]&lt;span class="o"&gt;(&lt;/span&gt;https://get.k3s.io&lt;span class="o"&gt;)&lt;/span&gt; | sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give it a minute to start up, then check the service status again:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status k3s.service
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;You should now see the glorious green text: &lt;strong&gt;&lt;code&gt;Active: active (running)&lt;/code&gt;&lt;/strong&gt;. If it's stable and running, you've fixed the main issue!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final Step: Update Your Remote Kubeconfig
&lt;/h2&gt;

&lt;p&gt;Because &lt;code&gt;k3s&lt;/code&gt; was reinstalled, it has generated new security certificates. The old configuration file on your Mac is now invalid. You need to copy the new one over.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Pi:&lt;/strong&gt; Copy the new config to your home directory:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo cat&lt;/span&gt; /etc/rancher/k3s/k3s.yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$HOME&lt;/span&gt;/k3s_config.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;On the Pi:&lt;/strong&gt; &lt;strong&gt;Edit&lt;/strong&gt; the copied file (&lt;code&gt;nano $HOME/k3s_config.yaml&lt;/code&gt;) and change the &lt;code&gt;server:&lt;/code&gt; address from &lt;code&gt;https://127.0.0.1:6443&lt;/code&gt; to use your Pi's &lt;strong&gt;static IP address&lt;/strong&gt; (e.g., &lt;code&gt;https://192.168.1.100:6443&lt;/code&gt;). Save and exit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On your Mac:&lt;/strong&gt; Use &lt;code&gt;scp&lt;/code&gt; to copy the updated file, replacing your old config. (Remember: back up &lt;code&gt;~/.kube/config&lt;/code&gt; first if you have other cluster contexts in it!)&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your Mac - use your Pi's user/hostname/IP&lt;/span&gt;
scp &amp;lt;your_pi_user&amp;gt;@&amp;lt;your_pi_hostname_or_ip&amp;gt;:~/k3s_config.yaml ~/.kube/config
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Test Again:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your Mac&lt;/span&gt;
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Your &lt;code&gt;kubectl&lt;/code&gt; commands should now connect successfully and consistently!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>kubernetes</category>
      <category>k3s</category>
      <category>raspberrypi</category>
      <category>homelab</category>
    </item>
    <item>
      <title>How to Use kubectl Directly on Your Raspberry Pi k3s Node</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Thu, 30 Oct 2025 11:02:49 +0000</pubDate>
      <link>https://forem.com/shankar_t/how-to-use-kubectl-directly-on-your-raspberry-pi-k3s-node-1o9</link>
      <guid>https://forem.com/shankar_t/how-to-use-kubectl-directly-on-your-raspberry-pi-k3s-node-1o9</guid>
      <description>&lt;p&gt;You've set up a &lt;code&gt;k3s&lt;/code&gt; Kubernetes cluster on your Raspberry Pi and deployed an application. While managing it remotely with &lt;code&gt;kubectl&lt;/code&gt; from your main computer is great, sometimes you need to quickly check pod status or logs directly on the Pi itself.&lt;/p&gt;

&lt;p&gt;You might notice that just typing &lt;code&gt;kubectl get pods&lt;/code&gt; on the Pi gives you a connection error. That's because the standard &lt;code&gt;kubectl&lt;/code&gt; command doesn't automatically know where to find the &lt;code&gt;k3s&lt;/code&gt; cluster configuration or have the right permissions.&lt;/p&gt;

&lt;p&gt;Luckily, &lt;code&gt;k3s&lt;/code&gt; provides a handy wrapper command! Here's how to use it:&lt;/p&gt;




&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SSH into your Raspberry Pi:&lt;/strong&gt;&lt;br&gt;
Connect to your Pi using its hostname or IP address.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &amp;lt;your_pi_user&amp;gt;@&amp;lt;your_pi_hostname_or_ip&amp;gt;
&lt;span class="c"&gt;# Example: ssh pi-admin@k3s-node.local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run the &lt;code&gt;k3s kubectl&lt;/code&gt; Command:&lt;/strong&gt;&lt;br&gt;
Prefix your usual &lt;code&gt;kubectl&lt;/code&gt; commands with &lt;code&gt;sudo k3s kubectl&lt;/code&gt;. This special command automatically uses the correct admin configuration (&lt;code&gt;/etc/rancher/k3s/k3s.yaml&lt;/code&gt;) and runs with the necessary permissions.&lt;/p&gt;

&lt;p&gt;To check your running pods:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;k3s kubectl get pods &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="c"&gt;# -A shows pods in all namespaces&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Or, if you know the namespace (e.g., &lt;code&gt;default&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;k3s kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; default
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Check Pod Logs (Optional but useful):&lt;/strong&gt;&lt;br&gt;
First, get the full name of the pod you're interested in from the &lt;code&gt;get pods&lt;/code&gt; command above (it will look something like &lt;code&gt;my-app-deployment-xxxxxxxxxx-xxxxx&lt;/code&gt;). Then, view its logs:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the Pi - Replace &amp;lt;your-pod-name&amp;gt; and &amp;lt;namespace&amp;gt;&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;k3s kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; &amp;lt;your-pod-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# Example: sudo k3s kubectl logs -f my-app-deployment-7f8c9d4b4f-g2hjl -n default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;The &lt;code&gt;-f&lt;/code&gt; flag follows the logs in real-time, showing you the latest output from your application's container directly in the Pi's terminal.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;That's all there is to it! Using &lt;code&gt;sudo k3s kubectl&lt;/code&gt; is the straightforward way to interact with your &lt;code&gt;k3s&lt;/code&gt; cluster directly on the node it's running on.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>k3s</category>
      <category>raspberrypi</category>
      <category>devops</category>
    </item>
    <item>
      <title>Setting Up GitOps with Flux on a Kubernetes Cluster</title>
      <dc:creator>Shankar</dc:creator>
      <pubDate>Thu, 30 Oct 2025 11:00:10 +0000</pubDate>
      <link>https://forem.com/shankar_t/setting-up-gitops-with-flux-on-a-kubernetes-cluster-5d8l</link>
      <guid>https://forem.com/shankar_t/setting-up-gitops-with-flux-on-a-kubernetes-cluster-5d8l</guid>
      <description>&lt;p&gt;Ready to automate your Kubernetes deployments? GitOps is the way to go, and FluxCD is a fantastic tool to make it happen. This guide walks you through the initial setup: installing Flux on your cluster and connecting it to your GitHub repository. Let's get started!&lt;/p&gt;




&lt;h2&gt;
  
  
  What's GitOps, Anyway?
&lt;/h2&gt;

&lt;p&gt;In simple terms, GitOps means using a Git repository as the &lt;strong&gt;single source of truth&lt;/strong&gt; for your desired infrastructure and application state. Flux is the operator that runs in your Kubernetes cluster, constantly comparing the cluster's live state to the state defined in your Git repo. If they differ, Flux automatically makes changes to the cluster to match the repo. Magic! &lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before we begin, make sure you have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;A Kubernetes Cluster:&lt;/strong&gt; Any cluster will do (like &lt;code&gt;k3s&lt;/code&gt; on a Raspberry Pi, &lt;code&gt;minikube&lt;/code&gt;, or a cloud provider's offering). Ensure &lt;code&gt;kubectl&lt;/code&gt; is configured to access it.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;A GitHub Account:&lt;/strong&gt; We'll use GitHub to host our configuration repository.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;A GitHub Personal Access Token (PAT):&lt;/strong&gt; Flux needs this to create deploy keys and potentially commit manifests back to your repository during the bootstrap process.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Create a GitHub Personal Access Token (PAT)
&lt;/h2&gt;

&lt;p&gt;Flux needs permissions to interact with your repository.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Go to your GitHub &lt;strong&gt;Settings&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Navigate to &lt;strong&gt;Developer settings&lt;/strong&gt; (usually near the bottom left).&lt;/li&gt;
&lt;li&gt; Click on &lt;strong&gt;Personal access tokens&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Tokens (classic)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Generate new token&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Generate new token (classic)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Give it a descriptive &lt;strong&gt;Note&lt;/strong&gt; (e.g., "flux-bootstrap").&lt;/li&gt;
&lt;li&gt; Set an &lt;strong&gt;Expiration&lt;/strong&gt; (e.g., 90 days - &lt;em&gt;remember to rotate it!&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt; Select the &lt;strong&gt;&lt;code&gt;repo&lt;/code&gt;&lt;/strong&gt; scope. This grants permissions needed for Flux to manage repository configuration.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Generate token&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Immediately copy the generated token!&lt;/strong&gt; You won't see it again.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Store the Token Securely (Temporarily in ENV):&lt;/strong&gt; For the bootstrap command, export it as an environment variable in your terminal. &lt;strong&gt;Never commit this token to Git!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;paste-your-token-here&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Export Your GitHub Username:&lt;/strong&gt; Flux also needs your username.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GITHUB_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;your-github-username&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  2. Install the Flux CLI
&lt;/h2&gt;

&lt;p&gt;You'll need the &lt;code&gt;flux&lt;/code&gt; command-line tool to interact with Flux. Installation methods vary by OS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;macOS (Homebrew):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;fluxcd/tap/flux
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Linux/Other (Curl):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;https://fluxcd.io/install.sh]&lt;span class="o"&gt;(&lt;/span&gt;https://fluxcd.io/install.sh&lt;span class="o"&gt;)&lt;/span&gt; | &lt;span class="nb"&gt;sudo &lt;/span&gt;bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Check the official Flux documentation for other methods)&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verify the installation:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
which flux
flux --version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>fluxcd</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
