<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: 0xAlphaSecurity</title>
    <description>The latest articles on Forem by 0xAlphaSecurity (@0xalphasecurity).</description>
    <link>https://forem.com/0xalphasecurity</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3692933%2F305306af-d7ec-4e52-8bf1-3316fd6f6ecf.png</url>
      <title>Forem: 0xAlphaSecurity</title>
      <link>https://forem.com/0xalphasecurity</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/0xalphasecurity"/>
    <language>en</language>
    <item>
      <title>Chapter 5: Linux Control Groups (cgroups)</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Sun, 22 Mar 2026 20:23:15 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/chapter-5-linux-control-groups-cgroups-2mbm</link>
      <guid>https://forem.com/0xalphasecurity/chapter-5-linux-control-groups-cgroups-2mbm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post is part of the Ultimate Container Security Series, a structured, multi-part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;see the series introduction post here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When dozens or hundreds of applications share the same Linux system, managing their access to hardware resources, like CPU, memory, and disk I/O, becomes an absolute necessity. Without strict boundaries, a single misbehaving or compromised process can easily consume all available resources. This starves other applications, degrades system performance, and can even bring the entire host down.&lt;/p&gt;

&lt;p&gt;From a security perspective, an attacker exploiting an unbounded application can intentionally cause this resource exhaustion, resulting in a severe Denial of Service (DoS). Because containers are ultimately just processes running on a shared host kernel, they are equally susceptible to this risk. To keep services stable and secure, we need a way to enforce fairness and strict isolation.&lt;/p&gt;

&lt;p&gt;In this chapter, we will explore Linux Control Groups (cgroups), a powerful kernel feature that allows us to limit and isolate the resource usage of processes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to cgroups
&lt;/h2&gt;

&lt;p&gt;At its core, &lt;strong&gt;cgroups v2&lt;/strong&gt; is a Linux kernel mechanism that allows the system to organize processes into hierarchical groups and apply strict resource limits to them. With cgroups, administrators and container runtimes can precisely dictate how much CPU time, memory, and disk I/O throughput a specific set of processes is allowed to consume.&lt;/p&gt;

&lt;p&gt;Understanding how cgroups operate is essential because they are the mechanism Linux uses to enforce resource fairness at the kernel level.&lt;/p&gt;

&lt;p&gt;Consider a scenario where a process is allowed to consume unlimited memory. It will eventually starve other critical processes on the same host. This might happen inadvertently due to a bug, like a memory leak in a poorly written application. However, from a security perspective, an attacker can deliberately trigger or exploit this leak to perform a resource exhaustion attack. By strictly capping the memory and other resources a containerized process can access, you neutralize the blast radius of this kind of attack, ensuring the rest of the host system continues operating normally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cgroups v1 vs. cgroups v2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Control groups have been around for a long time, but the ecosystem has fundamentally shifted. While version 2 of cgroups has been in the Linux kernel since 2016 (with Fedora leading the charge as the first major distro to default to it in mid-2019), it is now the undeniable standard for modern Linux systems and orchestration platforms.&lt;/p&gt;

&lt;p&gt;The biggest architectural difference lies in how processes are grouped. In &lt;strong&gt;cgroups v1&lt;/strong&gt;, controllers (the mechanisms that actually govern resources like memory or PIDs) were completely independent. A single process could belong to entirely different groups for different resources. For example, a process could simultaneously join &lt;code&gt;/sys/fs/cgroup/memory/mygroup&lt;/code&gt; and &lt;code&gt;/sys/fs/cgroup/pids/yourgroup&lt;/code&gt;. This fragmented design led to incredibly complex, confusing hierarchies that were hard to manage and secure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cgroups v2&lt;/strong&gt; fixes this by introducing a single unified hierarchy. The semantics are much cleaner: a process joins one specific group (e.g., &lt;code&gt;/sys/fs/cgroup/ourgroup&lt;/code&gt;) and is automatically subject to all the active controllers configured for that group.&lt;/p&gt;

&lt;p&gt;Beyond making resource management much easier to reason about, &lt;strong&gt;cgroups v2&lt;/strong&gt; brings several massive improvements to stability and security:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safer Sub-tree Delegation&lt;/strong&gt;: It safely allows delegating cgroup management to less-privileged users. This is a crucial feature that makes rootless containers possible, allowing resource limits to be applied without requiring root privileges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Memory Accounting&lt;/strong&gt;: It properly accounts for different types of memory usage that v1 missed or handled poorly, including network memory, kernel memory, and non-immediate resource changes like page cache write-backs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pressure Stall Information (PSI)&lt;/strong&gt;: A newer feature that provides rich, real-time metrics on system resource pressure, allowing systems to proactively detect and respond to resource shortages before a crash occurs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Isolation&lt;/strong&gt;: Better cross-resource allocation management prevents edge-case scenarios where high usage of one resource unexpectedly impacts another.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Throughout this guide, we will focus entirely on cgroups v2, as it is the modern implementation used by secure container environments.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before we dive deeper, you should verify that your host system is actually running cgroups v2. You can easily check this by querying the filesystem type of the cgroup mount point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;stat&lt;/span&gt; &lt;span class="nt"&gt;-fc&lt;/span&gt; %T /sys/fs/cgroup/
cgroup2fs &lt;span class="c"&gt;# Note: If the output reads cgroup2fs, you are ready to go. If the output is tmpfs or cgroupfs, your system is still using the legacy cgroups v1 hierarchy.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Exploring Cgroups
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Warning: Always run the commands in this guide on a disposable Virtual Machine (VM) and never on your personal host machine. Playing with kernel resource limits can easily freeze or crash your system! The examples in this course were run on an Ubuntu Server 24.04 VM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The core idea behind cgroups is elegantly simple: &lt;strong&gt;Processes are organized into hierarchical groups, and each group is assigned specific resource limits.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Linux, "everything is a file," and cgroups are no exception. There is no special CLI tool you must use to interact with them. Instead, cgroups are exposed directly through a virtual filesystem, usually mounted at &lt;code&gt;/sys/fs/cgroup&lt;/code&gt;. Inside this directory, groups are represented as folders, and resource limits are represented as plain text files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing values into these text files directly changes the kernel's behavior.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's look at the root of the cgroups v2 filesystem with running &lt;code&gt;ls /sys/fs/cgroup/&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqqboj8a1gu4hh4pmy1i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqqboj8a1gu4hh4pmy1i.png" alt=" " width="800" height="87"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This directory is the &lt;strong&gt;root control group&lt;/strong&gt;. Every single process running on your Linux machine belongs to this root group by default.&lt;/p&gt;

&lt;p&gt;In modern Linux systems that use systemd, the cgroups v2 filesystem mounted at &lt;code&gt;/sys/fs/cgroup&lt;/code&gt; forms a hierarchical tree where processes are organized and managed. The root directory represents the &lt;strong&gt;root control group&lt;/strong&gt;, and &lt;code&gt;systemd&lt;/code&gt; automatically creates subgroups such as &lt;code&gt;init.scope&lt;/code&gt; (which contains the system’s PID 1 process), &lt;code&gt;system.slice&lt;/code&gt; (which holds system services and daemons), and &lt;code&gt;user.slice&lt;/code&gt; (which organizes user sessions). Because &lt;code&gt;systemd&lt;/code&gt; manages most services on the system, container runtimes like Docker or orchestration platforms like Kubernetes typically run as system services under &lt;code&gt;system.slice&lt;/code&gt;. As a result, the containers they start appear as &lt;strong&gt;nested cgroup directories beneath those services&lt;/strong&gt;, for example, under &lt;code&gt;system.slice/docker.service/docker-container.scope&lt;/code&gt;. This means containers are still part of the same overall cgroup hierarchy, just placed deeper in the tree according to the service that created them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup (root)
│
├── init.scope
│
├── system.slice
│   ├── docker.service
│   │   └── docker-container.scope
│   │
│   └── ssh.service
│
└── user.slice
    └── user-1000.slice
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whenever a new subdirectory is created here, it represents a new child cgroup that inherits from its parent.&lt;/p&gt;

&lt;p&gt;If you look closely at the files in a cgroup directory, you'll notice a strict naming convention. Files are divided into two main categories: &lt;strong&gt;Core files&lt;/strong&gt; and &lt;strong&gt;Controller files&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core Files (&lt;code&gt;cgroup.*&lt;/code&gt;)&lt;/strong&gt;: Files prefixed with &lt;code&gt;cgroup.&lt;/code&gt; manage the mechanics of the cgroup hierarchy itself, rather than specific hardware resources.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cgroup.procs&lt;/code&gt;: The most important file. It contains a list of Process IDs (PIDs) that belong to this group. To move a process into a cgroup, you simply echo its PID into this file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cgroup.controllers&lt;/code&gt;: A read-only file showing which resource controllers (cpu, memory, io) are currently available to this specific group.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cgroup.kill&lt;/code&gt;: A v2 feature that lets you instantly kill all processes within the cgroup by writing 1 to it.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Controller Files (&lt;code&gt;cpu.&lt;em&gt;&lt;/em&gt;&lt;/code&gt;,&lt;code&gt;memory.&lt;/code&gt;, &lt;code&gt;pids.*&lt;/code&gt;, etc.)&lt;/strong&gt;: Controllers are the actual engines that distribute and limit system resources. Files prefixed with a controller name dictate how that specific resource is managed. Furthermore, these files generally fall into two types:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Configuration (Read-Write)&lt;/strong&gt;: Files you modify to set limits. (e.g., &lt;code&gt;memory.max&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status (Read-Only)&lt;/strong&gt;: Files you read to get live metrics. (e.g., &lt;code&gt;memory.stat&lt;/code&gt;). For example &lt;code&gt;watch cat /sys/fs/cgroup/memory.stat&lt;/code&gt; will show you real-time memory usage stats for that cgroup.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Controllers Files
&lt;/h3&gt;

&lt;p&gt;While the Linux kernel supports many controllers, a few are absolutely critical for securing containerized workloads against resource exhaustion and Denial of Service (DoS) attacks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory (&lt;code&gt;memory.*&lt;/code&gt;)&lt;/strong&gt;: Regulates RAM usage. 

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;memory.max&lt;/code&gt; sets an absolute hard limit. If the processes in the cgroup try to use more memory than this, the kernel's Out-Of-Memory (OOM) killer will step in and terminate them.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory.high&lt;/code&gt; is a softer throttle limit. If breached, the kernel heavily throttles the processes and forces them to reclaim memory, but avoids outright killing them.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;CPU (&lt;code&gt;cpu.*&lt;/code&gt;)&lt;/strong&gt;: Regulates processor time.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cpu.max&lt;/code&gt; limits the absolute maximum amount of CPU time the group can use (bandwidth).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cpu.weight&lt;/code&gt; dictates proportional share. If the system is busy, a cgroup with a higher weight gets priority over one with a lower weight.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;PIDs (&lt;code&gt;pids.*&lt;/code&gt;)&lt;/strong&gt;: Regulates process creation.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pids.max&lt;/code&gt; sets a hard limit on how many processes can exist inside the cgroup. From a security standpoint, this is your primary defense against a Fork Bomb attack, where a malicious script rapidly clones itself to crash the host.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Block I/O (&lt;code&gt;io.*&lt;/code&gt;)&lt;/strong&gt;: Regulates disk read/write bandwidth.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;io.max&lt;/code&gt; can prevent a compromised container from thrashing the host's storage drives and starving other containers of database reads or log writes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For highly specialized workloads, cgroups v2 offers several other controllers. While you might not interact with these daily, it's good to know they exist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cpuset (&lt;code&gt;cpuset.*&lt;/code&gt;)&lt;/strong&gt;: Pins tasks to specific CPU cores and Memory Nodes. This is crucial for high-performance computing on NUMA architectures where memory access latency matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Devices&lt;/strong&gt;: Controls which device nodes (like &lt;code&gt;/dev/sda&lt;/code&gt; or &lt;code&gt;/dev/random&lt;/code&gt;) a cgroup can access. In v2, this is actually implemented using eBPF programs rather than standard text files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HugeTLB (&lt;code&gt;hugetlb.*&lt;/code&gt;)&lt;/strong&gt;: Limits the usage of Huge Pages (large blocks of memory) to prevent a single group from exhausting them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RDMA (&lt;code&gt;rdma.*&lt;/code&gt;)&lt;/strong&gt;: Manages Remote Direct Memory Access resources, often used in high-speed clustered networking.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Creating cgroups
&lt;/h2&gt;

&lt;p&gt;Now that we understand how the cgroups filesystem works, let's create a custom cgroup hierarchy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Warning: As mentioned earlier, do not run these commands on your host machine. Use a VM (examples work with Ubuntu Server 24.04).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most of the commands we are about to run require root privileges. Let's switch to the root user and install &lt;code&gt;cgroup-tools&lt;/code&gt;, which provides useful utilities like &lt;code&gt;cgcreate&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;su
apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;cgroup-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's export some environment variables to make our commands easier to read. We are going to create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;parent cgroup&lt;/strong&gt; called &lt;code&gt;scripts&lt;/code&gt; (A parent cgroup is the higher-level group that can contain one or more subgroups. It usually defines the overall resource limits that apply to everything inside it.)&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;child cgroup&lt;/strong&gt; called &lt;code&gt;production&lt;/code&gt; (A child cgroup is a subgroup created inside the parent group. Processes can be placed into the child group, and it can have its own additional limits, but it can never exceed the limits set by its parent.)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"scripts"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the parent &lt;code&gt;scripts&lt;/code&gt; group had a limit of 2 GB of memory, then the child &lt;code&gt;production&lt;/code&gt; group could only use up to that 2 GB, even if it tried to set a higher limit. The child can further restrict resources, but it cannot escape the limits of its parent.&lt;/p&gt;

&lt;p&gt;So the structure will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup/
└── scripts        (parent cgroup)
    └── production    (child cgroup)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While you can create cgroups using standard Linux commands (e.g., &lt;code&gt;mkdir /sys/fs/cgroup/scripts&lt;/code&gt;), using the &lt;code&gt;cgcreate&lt;/code&gt; utility allows us to explicitly request which controllers we want to enable.&lt;/p&gt;

&lt;p&gt;Let's create our parent cgroup and request only the &lt;code&gt;memory&lt;/code&gt; and &lt;code&gt;cpu&lt;/code&gt; controllers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgcreate &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the command returns no output, it was successful. Let's look inside the newly created directory: &lt;code&gt;ls /sys/fs/cgroup/${PARENT_CGROUP}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xdqyc0t2i4ytxg06h7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xdqyc0t2i4ytxg06h7b.png" alt=" " width="800" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see a large list of files representing the parameters and statistics for this new group. However, if you look closely at the active controllers, you might notice something unexpected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@container-security:/home/user#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/fs/cgroup/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/cgroup.controllers
&lt;span class="go"&gt;cpu memory pids
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;pids&lt;/code&gt; controller is active, even though we only requested &lt;code&gt;memory&lt;/code&gt; and &lt;code&gt;cpu&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;To understand why &lt;code&gt;pids&lt;/code&gt; showed up, we need to look at the root cgroup (&lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;). Run this command: &lt;code&gt;cat /sys/fs/cgroup/cgroup.subtree_control&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@container-security:~#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/fs/cgroup/cgroup.subtree_control
&lt;span class="go"&gt;cpu memory pids
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In cgroups v2, resource controllers are strictly delegated &lt;strong&gt;top-down&lt;/strong&gt;. The &lt;code&gt;cgroup.subtree_control&lt;/code&gt; file dictates which controllers are passed down to a group's immediate children. Because the root cgroup is configured to delegate &lt;code&gt;cpu&lt;/code&gt;, &lt;code&gt;memory&lt;/code&gt;, and &lt;code&gt;pids&lt;/code&gt;, our new &lt;code&gt;${PARENT_CGROUP}&lt;/code&gt; automatically inherited all three.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;pids&lt;/code&gt; controller in cgroups limits the number of processes (PIDs) that a group can create. A PID is simply a process identifier used by the Linux kernel to track running processes. It is usually enabled by default to prevent fork bombs and runaway process creation. Without it, cgroups could limit CPU and memory, but not process count, which is a safety risk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before we create our child cgroup, there is a crucial cgroups v2 rule you must know: &lt;strong&gt;The No Internal Process Constraint&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In v2, a cgroup can either have processes assigned to it, OR it can delegate controllers to child cgroups, &lt;strong&gt;it cannot do both&lt;/strong&gt;. (The only exception is the root cgroup).&lt;/p&gt;

&lt;p&gt;Because our &lt;code&gt;${PARENT_CGROUP}&lt;/code&gt; is going to delegate &lt;code&gt;cpu&lt;/code&gt; and &lt;code&gt;memory&lt;/code&gt; to its children, the kernel will refuse to let you assign any running processes directly to &lt;code&gt;${PARENT_CGROUP}&lt;/code&gt;. Instead, processes must be assigned to the leaf nodes of the tree (the final child directories).&lt;/p&gt;

&lt;p&gt;Let's create the child cgroup where our actual demo processes will live:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgcreate &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Although we created the parent and child cgroups in two separate steps, this was mainly for demonstration purposes. In practice, the first &lt;code&gt;cgcreate&lt;/code&gt; command is technically redundant because running the second command (&lt;code&gt;cgcreate -g memory,cpu:/${PARENT_CGROUP}/${CHILD_CGROUP}&lt;/code&gt;) would automatically create both the parent (scripts) and the child (production) cgroups if the parent does not already exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When we ran this, &lt;code&gt;cgcreate&lt;/code&gt; automatically updated the &lt;code&gt;cgroup.subtree_control&lt;/code&gt; file in the parent directory to delegate the requested controllers down to the child. We can verify this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@container-security:/home/user#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/fs/cgroup/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/cgroup.subtree_control
&lt;span class="go"&gt;cpu memory
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, let's look inside our new child cgroup: &lt;code&gt;ls /sys/fs/cgroup/${PARENT_CGROUP}/${CHILD_CGROUP}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If you check the files here, you will see &lt;code&gt;cpu.*&lt;/code&gt; and &lt;code&gt;memory.*&lt;/code&gt; files, but absolutely no &lt;code&gt;pids.*&lt;/code&gt; or &lt;code&gt;io.*&lt;/code&gt; files. We now have a perfectly isolated, highly specific leaf cgroup ready to constrain our applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyj9xuucxjer1htv5de0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyj9xuucxjer1htv5de0.png" alt=" " width="800" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Resource Limits
&lt;/h2&gt;

&lt;p&gt;Having created our isolated cgroup hierarchy, it is time to actually enforce some boundaries. This is where the core security value of cgroups shines: by setting strict resource limits, we protect the host system from resource exhaustion attacks and ensure predictable performance.&lt;/p&gt;

&lt;p&gt;While you can configure these limits by directly writing to the files with &lt;code&gt;echo&lt;/code&gt; (e.g., &lt;code&gt;echo "20000 50000" &amp;gt; /sys/fs/cgroup/my_group/cpu.max&lt;/code&gt;), we will use the &lt;code&gt;cgset&lt;/code&gt; utility from the &lt;code&gt;cgroup-tools&lt;/code&gt; package we installed earlier, as it provides a cleaner syntax for setting multiple limits at once.&lt;/p&gt;

&lt;p&gt;Before we apply the limits to our cgroup, let's understand exactly what we are controlling.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU Throttling (&lt;code&gt;cpu.max&lt;/code&gt;)&lt;/strong&gt;: In cgroups v2, CPU limits use a simple quota-based model formatted as &lt;code&gt;$MAX $PERIOD&lt;/code&gt;. If you set the value to 100000 1000000, you are telling the kernel: For every 1,000,000 microseconds (1 second) of time, this group is allowed to use the CPU for 100,000 microseconds (a tenth of a second). This effectively limits the cgroup to 10% of a single CPU core. &lt;em&gt;Security Note&lt;/em&gt;: Unlike memory limits, CPU limits act as a throttle. If a process hits its CPU limit, the kernel simply pauses it until the next period begins. CPU throttling slows applications down, but it never outright kills them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory Limits (&lt;code&gt;memory.max&lt;/code&gt; &amp;amp; &lt;code&gt;memory.swap.max&lt;/code&gt;)&lt;/strong&gt;: Memory limits set an absolute ceiling on RAM usage. If a cgroup exceeds the value in &lt;code&gt;memory.max&lt;/code&gt;, the kernel initiates heavy throttling. It will aggressively try to reclaim memory by dropping cached data or swapping memory pages out to disk. However, if the process continues demanding memory and the kernel cannot reclaim enough (or if swap is also exhausted), the kernel triggers the Out-Of-Memory (OOM) killer. It calculates an OOM score and terminates the most offending process within that cgroup to protect the rest of the host system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For the tests, we want to intentionally induce an early OOM kill. To guarantee this happens, we need to strictly limit both the physical memory and the swap memory. Otherwise, the kernel might just push our runaway process into swap space, delaying the crash.&lt;/p&gt;

&lt;p&gt;Let's apply a 15% CPU limit and a roughly 200MB limit for both RAM and swap to our &lt;code&gt;production&lt;/code&gt; child cgroup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgset &lt;span class="nt"&gt;-r&lt;/span&gt; memory.max&lt;span class="o"&gt;=&lt;/span&gt;200000000 &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# (Note: Memory values here are in bytes, but you could also use suffixes like 100M or 1G.)&lt;/span&gt;
cgset &lt;span class="nt"&gt;-r&lt;/span&gt; memory.swap.max&lt;span class="o"&gt;=&lt;/span&gt;200000000 &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
cgset &lt;span class="nt"&gt;-r&lt;/span&gt; cpu.max&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"150000 1000000"&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s verify that the kernel accepted our new limits by reading the files directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/fs/cgroup/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="o"&gt;{&lt;/span&gt;memory,cpu,memory.swap&lt;span class="o"&gt;}&lt;/span&gt;.max
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see an output similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;199999488
150000 1000000
199999488
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You might be wondering why the &lt;code&gt;200000000&lt;/code&gt; bytes we assigned for memory suddenly changed to &lt;code&gt;199999488&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The kernel manages memory in fixed-size blocks called "pages." On most standard systems, a memory page is exactly 4096 bytes (you can verify your system's page size by running &lt;code&gt;getconf PAGE_SIZE&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When you request a memory limit, the kernel rounds your request down to the nearest whole page. If you divide our requested 200,000,000 bytes by 4096, you get roughly 48,828.125 pages. The kernel drops the decimal, granting you exactly 48,828 pages. Multiply 48,828 by 4096, and you get &lt;code&gt;199,999,488&lt;/code&gt;, the exact byte limit the kernel applied.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing and Managing Cgroup Processes
&lt;/h2&gt;

&lt;p&gt;Now that our resource limits are strictly defined in our &lt;code&gt;production&lt;/code&gt; cgroup, it’s time to put them to the test. We will observe how cgroups throttle CPU usage, how they handle memory exhaustion, and how we can use built-in tools to manage these processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stressing the CPU
&lt;/h3&gt;

&lt;p&gt;Let's start by establishing a baseline. We will run a command that is notorious for hogging 100% of a CPU core: copying an infinite stream of zeros into the void. Run this command directly on your host (outside our restricted cgroup):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;dd &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/zero &lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/null &amp;amp;
&lt;span class="nb"&gt;sleep &lt;/span&gt;2
ps &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$!&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; %cpu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because this process has no bounds, the output will show it consuming nearly 100% of the CPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;%CPU
98.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;kill $!&lt;/code&gt; to stop the process before we moving on.&lt;/p&gt;

&lt;p&gt;Now, let's run that exact same command, but this time we will use &lt;code&gt;cgexec&lt;/code&gt; to launch it directly inside our restricted child cgroup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgexec &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nb"&gt;dd &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/zero &lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/null &amp;amp;
&lt;span class="nb"&gt;sleep &lt;/span&gt;2
ps &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$!&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; %cpu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the output now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;%CPU
15.3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;kill $!&lt;/code&gt; to stop the process.&lt;/p&gt;

&lt;p&gt;The CPU usage hovers right around the 15% limit we defined earlier! If you watch this process in a live monitor like &lt;code&gt;htop&lt;/code&gt;, you will see it consistently stay at or below that threshold. The kernel is aggressively pausing and resuming the process to enforce our quota.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filling Up the Memory (Triggering an OOM Kill)
&lt;/h3&gt;

&lt;p&gt;Let's see what happens when a process refuses to stay within its memory limits. We are going to launch a bash process inside our cgroup that continuously appends 10MB of random data to a variable every half-second until it crashes. This script will quickly breach the roughly 200MB limit we imposed. Because we also limited swap space, the kernel won't be able to page the data to disk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgexec &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'a=(); while true; do a+=("$(head -c 10M /dev/zero | tr "\0" "A")"); sleep 1; done'&lt;/span&gt; &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can watch the memory footprint (RSS - Resident Set Size) grow rapidly in real-time using the watch command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;watch ps &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$!&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; rss,sz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within a few seconds, the cgroup will run completely out of memory, and the kernel's Out-Of-Memory (OOM) killer will intervene to protect the host. You will see an output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;[1]+  Killed                  cgexec -g memory,cpu:$&lt;/span&gt;&lt;span class="o"&gt;{&lt;/span&gt;PARENT_CGROUP&lt;span class="o"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'a=(); while true; do a+=("$(head -c 10M /dev/zero | tr "\0" "A")"); sleep 1; done'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this setup, &lt;code&gt;memory.max&lt;/code&gt; was used, which acts as a hard limit and triggers the OOM killer when exceeded. A softer and safer approach is to use &lt;code&gt;memory.high&lt;/code&gt; instead. When a process reaches &lt;code&gt;memory.high&lt;/code&gt;, the kernel heavily throttles the process and applies strong memory reclaim pressure. This forces the process to slow down and release memory, acting more like a “speed bump” than a hard stop. This behavior provides monitoring systems and administrators time to react and take action before the application is terminated by the OOM killer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring with &lt;code&gt;systemd-cgtop&lt;/code&gt; and &lt;code&gt;systemd-cgls&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Just as you use &lt;code&gt;top&lt;/code&gt; and &lt;code&gt;ls&lt;/code&gt; to view standard processes, Linux provides &lt;code&gt;systemd-cgtop&lt;/code&gt; and &lt;code&gt;systemd-cgls&lt;/code&gt; specifically for monitoring cgroups.&lt;/p&gt;

&lt;p&gt;First, let's populate our cgroup with a few sleeping background processes so we have something to look at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;p &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;cgexec &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nb"&gt;sleep &lt;/span&gt;2000 &amp;amp; &lt;span class="k"&gt;done
&lt;/span&gt;cgexec &lt;span class="nt"&gt;-g&lt;/span&gt; memory,cpu:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nb"&gt;dd &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/zero &lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/null &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, run: &lt;code&gt;systemd-cgtop&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You will get a clean, live-updating table showing the resource consumption aggregated by cgroup:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblukor7lop354qgftjhf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblukor7lop354qgftjhf.png" alt=" " width="800" height="75"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want a hierarchical tree view of exactly which PIDs belong to which groups, use &lt;code&gt;systemd-cgls&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;root@container-security:/home/user#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;systemd-cgls /scripts
&lt;span class="go"&gt;CGroup /scripts:
└─production
  ├─2142 sleep 2000
  ├─2143 sleep 2000
  ├─2144 sleep 2000
  ├─2145 sleep 2000
  ├─2146 sleep 2000
  └─2147 dd if=/dev/zero of=/dev/null
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Killing All Processes in a Cgroup
&lt;/h3&gt;

&lt;p&gt;One of the best new features in cgroups v2 is the &lt;code&gt;cgroup.kill&lt;/code&gt; file. Instead of hunting down individual PIDs, you can instantly terminate everything inside a cgroup by writing a 1 to this file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;1 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /sys/fs/cgroup/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/cgroup.kill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you press enter a couple of times, you will see the terminal report that all the sleep processes we spawned earlier have been instantly killed. Checking &lt;code&gt;systemd-cgls /scripts&lt;/code&gt; will now show an empty group.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moving an Already-Running Process (&lt;code&gt;cgclassify&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;So far, we have been launching new processes directly into our cgroup using &lt;code&gt;cgexec&lt;/code&gt;. But what if a runaway process is already running on the host, and you want to lock it down on the fly?&lt;/p&gt;

&lt;p&gt;We can use the &lt;code&gt;cgclassify&lt;/code&gt; command for this. Let's start our CPU hog on the host system without limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;dd &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/zero &lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/null &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is currently consuming 100% of a core. Time to cage it. We use &lt;code&gt;cgclassify&lt;/code&gt; and pass it the PID (using &lt;code&gt;$!&lt;/code&gt; for the last background process):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgclassify &lt;span class="nt"&gt;-g&lt;/span&gt; cpu,memory:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nv"&gt;$!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;If you run &lt;code&gt;ps -p $! -o %cpu&lt;/code&gt; right after classifying the process, you might notice something strange. It might say the CPU usage is 75% or 50%, slowly ticking down, rather than an instant 10%. Why? This is because the &lt;code&gt;ps&lt;/code&gt; command does not show instantaneous CPU usage. It calculates the average CPU usage over the entire lifetime of the process. Because the process ran at 100% for a few seconds before we caged it, that lifetime average takes a while to drop! If you look at the process in &lt;code&gt;htop&lt;/code&gt; or &lt;code&gt;systemd-cgtop&lt;/code&gt; instead, you will see that its actual, real-time usage dropped to 10% the exact millisecond you ran the &lt;code&gt;cgclassify&lt;/code&gt; command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kill the process with: &lt;code&gt;echo 1 &amp;gt; /sys/fs/cgroup/${PARENT_CGROUP}/${CHILD_CGROUP}/cgroup.kill&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Viewing Configuration with &lt;code&gt;cgget&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;If you ever need to audit a cgroup to see exactly how it is configured and what its current stats are, &lt;code&gt;cgget&lt;/code&gt; is your go-to command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgget &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHILD_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dumps the contents of all the controller files into an easy-to-read list, showing you your max limits, current usage metrics, and even how many times the OOM killer has been triggered (oom_kill).&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning Up
&lt;/h3&gt;

&lt;p&gt;To keep your system clean, you can recursively delete the cgroups we just created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cgdelete &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; cpu:/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARENT_CGROUP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You might wonder why commands like &lt;code&gt;cgexec&lt;/code&gt; and &lt;code&gt;cgdelete&lt;/code&gt; require you to specify a controller (like &lt;code&gt;-g cpu&lt;/code&gt;:) even though cgroups v2 uses a unified hierarchy. This is simply a quirk for backward compatibility with cgroups v1 syntax. The command requires it to run, but in a v2 environment, the process is applied to the unified group regardless of which specific controller you type here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Containers and Cgroups
&lt;/h2&gt;

&lt;p&gt;Throughout this chapter, we manually created cgroups, configured resource limits, and assigned processes to them. While this is the best way to learn how the Linux kernel enforces resource distribution, you rarely have to do this by hand in the real world. You don't have to be using containers to take advantage of cgroups, but modern container runtimes provide an incredibly convenient abstraction layer over them.&lt;/p&gt;

&lt;p&gt;When you run a containerized application, runtimes like Docker or containerd automatically interact with the cgroups filesystem on your behalf. Behind the scenes, the runtime creates a dedicated cgroup hierarchy specifically for that container (typically using the long container ID as the directory name).&lt;/p&gt;

&lt;p&gt;When you pass a flag like &lt;code&gt;--memory 100M&lt;/code&gt; to a Docker run command, or define a CPU limit in a Kubernetes Pod specification, the container engine translates those human-readable requests directly into the &lt;code&gt;memory.max&lt;/code&gt; and &lt;code&gt;cpu.max&lt;/code&gt; files we explored earlier.&lt;/p&gt;

&lt;p&gt;From a security standpoint, understanding this underlying mechanism is critical. Constraining resources provides a powerful layer of protection against &lt;strong&gt;resource exhaustion&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whether an attacker deliberately exploits an application to consume excess memory, or a simple bug causes an accidental CPU spike, an unbounded container can easily starve legitimate applications running on the same host. By setting explicit memory and CPU limits on your container deployments, you ensure that the kernel's cgroups will throttle or kill the offending process before it can bring down your entire infrastructure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;series introduction post provides the complete roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>containers</category>
      <category>cybersecurity</category>
      <category>linux</category>
      <category>docker</category>
    </item>
    <item>
      <title>Chapter 4: Linux Capabilities</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Fri, 06 Mar 2026 15:05:03 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/chapter-4-linux-capabilities-5452</link>
      <guid>https://forem.com/0xalphasecurity/chapter-4-linux-capabilities-5452</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post is part of the Ultimate Container Security Series, a structured, multi-part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;see the series introduction post here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Understanding Linux capabilities is a fundamental step in mastering container security, as it allows us to move beyond the "all-or-nothing" approach of the traditional root user. By breaking down the monolithic power of root into granular privileges, we can grant a container exactly what it needs to function while significantly reducing the potential blast radius of an exploit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: Understanding capabilities
&lt;/h2&gt;

&lt;p&gt;To understand how to secure a container, we first need to understand how the Linux kernel handles privileges. The security model of containers is built directly on top of a kernel feature called Capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "All or Nothing" Problem
&lt;/h3&gt;

&lt;p&gt;Traditionally, UNIX-like systems operated on a binary permission model. For the purpose of permission checks, the kernel distinguished between only two categories of processes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Privileged processes&lt;/strong&gt; (Root): Processes with an effective User ID (UID) of 0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unprivileged processes&lt;/strong&gt; (Standard User): Processes with a non-zero UID.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This created a significant security gap known as the "All or Nothing" problem. A privileged process (UID 0) bypasses almost all kernel permission checks, allowing it to modify system files, install software, and reconfigure the network stack. A standard user, conversely, is strictly bound by permission checks.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;problem arises when a standard user needs to perform a specific action that requires elevated privileges&lt;/strong&gt;, such as opening a network socket (like ping using ICMP) or binding to a restricted port (like a web server on port 80). In the old model, the only solution was to give the process full root privileges, usually via the SUID (Set User ID) bit.&lt;/p&gt;

&lt;p&gt;As discussed in previous chapters, the SUID bit is a security risk. It effectively grants a program full superuser powers just to perform one minor task. If a hacker exploits a bug in a SUID binary, they don't just compromise that specific application, they gain full control over the entire system.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are Capabilities?
&lt;/h3&gt;

&lt;p&gt;To solve this "security risk," kernel developers introduced a more nuanced solution called Capabilities. Starting with Linux Kernel 2.2 (in 1999), the privileges traditionally associated with the superuser were broken down into distinct, independent units. These units are called &lt;strong&gt;capabilities&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The concept is straightforward: instead of checking "Is this user root?" the kernel checks "Does this thread have the specific capability to perform this action?"&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instead of being "Root," a process might only have &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; (to bind to ports &amp;lt; 1024).&lt;/li&gt;
&lt;li&gt;Instead of being "Root," a process might only have &lt;code&gt;CAP_CHOWN&lt;/code&gt; (to change file ownership).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While this feature was originally scoped only to processes, support for assigning capabilities directly to files was added in 2008. This evolution allows us to assign fine-grained permissions to executables so that processes that previously required UID 0/root permissions no longer need them to function.&lt;/p&gt;

&lt;p&gt;Capabilities are the technical implementation of the &lt;strong&gt;Principle of Least Privilege&lt;/strong&gt;. This security principle dictates that a process should possess only the bare minimum privileges necessary to perform its function and nothing more.&lt;/p&gt;

&lt;p&gt;By using capabilities, we can drastically reduce the attack surface. If a web server runs as a non-root user with only the minimal required capabilities (e.g., &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt;), then the impact of a compromise can be reduced.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Capability Sets
&lt;/h2&gt;

&lt;p&gt;Up to this point, capabilities sound simple: break root privileges into smaller pieces and assign only what is necessary. The real complexity begins when we look at how capabilities are stored, inherited, and transformed between processes and files. If you read the &lt;a href="https://man7.org/linux/man-pages/man7/capabilities.7.html" rel="noopener noreferrer"&gt;man capabilities page&lt;/a&gt;, you might find it terse and difficult to map to real-world scenarios.&lt;/p&gt;

&lt;p&gt;The confusion often stems from two sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Naming Collisions&lt;/strong&gt;: The kernel uses the same names (like "Effective" or "Inheritable") for both &lt;strong&gt;processes and files&lt;/strong&gt;, but they function quite differently depending on where they are applied.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counter-Intuitive Behavior&lt;/strong&gt;: Capabilities don't behave like the simple "SUID Root" model we are used to. Just because a parent process has a capability doesn't automatically mean the child process gets it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To demystify this, we first need to distinguish between the Process (the active entity) and the File (the passive storage).&lt;/p&gt;

&lt;h3&gt;
  
  
  Process vs. File Capabilities
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;execve()&lt;/code&gt; is a Linux system call that replaces the current running process with a new program. It loads the new executable into memory and starts it, keeping the same process ID but with new code and data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Process Capabilities&lt;/strong&gt;: When we talk about a "process" having capabilities, we are technically talking about a thread. In Linux, capability sets are maintained per thread.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Role&lt;/em&gt;: These determine what the running task is actually allowed to do right now.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Lifecycle&lt;/em&gt;: Thread capability sets are copied during a &lt;code&gt;fork()&lt;/code&gt; (creating a new thread/process) and are specially transformed during an &lt;code&gt;execve()&lt;/code&gt; (running a new program). Capabilities are especially important during &lt;code&gt;execve()&lt;/code&gt;, because that's when capability transformation rules apply.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Note&lt;/em&gt;: Most normal processes (like your text editor or shell) have and need zero capabilities. They rely on standard file permissions. Capabilities are generally only needed for system-level administration tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;File Capabilities&lt;/strong&gt;: Binaries on the disk can also have capabilities associated with them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Role&lt;/em&gt;: These are not "active" permissions. Instead, they are a set of instructions that tell the kernel: "When this file is executed, grant the process these specific privileges."&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Storage&lt;/em&gt;: These are stored in the file's &lt;strong&gt;Extended Attributes (xattrs)&lt;/strong&gt;, specifically within &lt;code&gt;security.capability&lt;/code&gt;. File capabilities depend on filesystem support for extended attributes (most modern filesystems support this). For example, in ext3/ext4, extended attributes are stored in the inode or in additional disk blocks. Many backup tools do not preserve extended attributes by default. Without preserving xattrs, file capabilities will be silently lost.&lt;/li&gt;
&lt;li&gt;When copied from one place to another, a binary will lose its capabilities. In order to keep capabilities, you can copy the file with &lt;code&gt;--preserve=all&lt;/code&gt; option. Example: &lt;code&gt;cp --preserve=all /origin/path /dest/path&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Constraint&lt;/em&gt;: Writing to this extended attribute requires the &lt;code&gt;CAP_SETFCAP&lt;/code&gt; capability. This ensures that standard users cannot simply grant themselves superpowers by editing a binary's attributes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The 5 Capability Sets
&lt;/h3&gt;

&lt;p&gt;To manage how privileges are granted, inherited, and limited, Linux uses five distinct "sets" of capabilities (which are represented as bit masks). Think of these as five different buckets that a process carries.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Set&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Process Capabilities&lt;/th&gt;
&lt;th&gt;File Capabilities&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Permitted (P)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The superset of what a process can do. A process can move capabilities from here to the Effective set, but it cannot add new ones that aren't already here.&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Effective (E)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The Active set. This is the only set the kernel actually checks when a process tries to do something (like open a port). If a capability is in Permitted but not Effective, the action fails.&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inheritable (I)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Capabilities that can be passed down to a child process. However, simply having a capability here isn't enough; the child executable must also be "willing" to receive it (via File Inheritable sets).&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bounding (B)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The hard limit. No capability can ever be added to the Permitted or Inheritable sets if it doesn't exist in the Bounding set.&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ambient (A)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Added in newer kernels to fix the "Inheritance Problem." It allows non-SUID binaries (which aren't capability-aware) to blindly inherit capabilities from their parent.&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Linux defines five capability sets for each thread:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Thread Permitted Set (P)&lt;/strong&gt;: The permitted set is the thread's upper bound of capabilities. It defines the maximum privilege scope the thread can ever exercise. A thread may call &lt;code&gt;capset()&lt;/code&gt; to move capabilities from &lt;strong&gt;Permitted&lt;/strong&gt; into the &lt;strong&gt;Effective&lt;/strong&gt; set (the capabilities that are actually checked by the kernel), and it may also use &lt;code&gt;capset()&lt;/code&gt; to place capabilities into the &lt;strong&gt;Inheritable&lt;/strong&gt; set (capabilities it is allowed to pass across an &lt;code&gt;execve()&lt;/code&gt; when combined with the executed file's inheritable capabilities). A thread cannot use &lt;code&gt;capset()&lt;/code&gt; to add &lt;em&gt;new&lt;/em&gt; capabilities to its permitted set (i.e., capabilities it doesn't already have) unless it has &lt;code&gt;CAP_SETPCAP&lt;/code&gt; in its effective set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread Effective Set (E)&lt;/strong&gt;: This is the set that the kernel actually checks during permission evaluation. If a capability is not in the effective set, the kernel behaves as if the process does not have it. The effective set is what truly matters during system calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread Inheritable Set (I)&lt;/strong&gt;: The inheritable set controls what capabilities may be passed across &lt;code&gt;execve()&lt;/code&gt; to a different binary. A capability in the thread inheritable set is not automatically granted to child processes. It only influences what may become permitted in the new program. Both the thread inheritable set and the file inheritable set must agree. The thread inheritable set and file inheritable set are different things (This is where many people get confused - more on that later).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounding Set (B)&lt;/strong&gt;: The bounding set acts as a hard ceiling on what capabilities a process can ever gain through &lt;code&gt;execve()&lt;/code&gt;. Even if a file has a capability marked as permitted, if that capability isn't in the bounding set, the process can never acquire it. It also limits which capabilities can be added to the inheritable set. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambient Set (A)&lt;/strong&gt;: The ambient set was introduced in Linux 4.3 to solve the problem of passing capabilities to ordinary binaries that have no file capabilities set. Any capability in the ambient set is automatically added to both the permitted and inheritable sets of the new process after &lt;code&gt;execve()&lt;/code&gt;, even for plain, unmodified binaries. To add a capability to the ambient set it must already be in both your permitted and inheritable sets, and dropping it from either one automatically removes it from the ambient as well.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Files only have:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;File Permitted Set&lt;/strong&gt;: The file permitted set defines the capabilities a binary is allowed to gain when executed, regardless of what the thread already has. These capabilities are added to the new process's permitted set after &lt;code&gt;execve()&lt;/code&gt;, but only if they are also allowed by the bounding set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Inheritable Set&lt;/strong&gt;: The file inheritable set specifies which capabilities the binary is willing to accept from the thread's inheritable set during &lt;code&gt;execve()&lt;/code&gt;. Only capabilities present in both the thread's inheritable set and the file's inheritable set will be carried over into the new process's permitted set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Effective Flag&lt;/strong&gt;: Unlike the other file sets, the effective field is just a single bit, not a set. When set, it tells the kernel to automatically move all of the new process's permitted capabilities into its effective set after &lt;code&gt;execve()&lt;/code&gt;, which is needed for older binaries that don't explicitly call &lt;code&gt;capset()&lt;/code&gt; to raise their own capabilities.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How Capabilities are Calculated
&lt;/h3&gt;

&lt;p&gt;When a process executes a binary (via &lt;code&gt;execve()&lt;/code&gt;), the kernel calculates the new capabilities for the process based on a specific formula. This formula combines what the parent thread had and what the file allows.&lt;/p&gt;

&lt;p&gt;When a thread executes a new binary the logic can be simplified as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Permitted Set Calculation&lt;/strong&gt;: The new permitted set is the union of two sources: capabilities that exist in both the thread's inheritable set and the file's inheritable set, plus capabilities that exist in the file's permitted set filtered through the bounding set.

&lt;ul&gt;
&lt;li&gt;Formula: &lt;code&gt;New Permitted = (Old Inheritable AND File Inheritable) OR (File Permitted AND Bounding Set)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Effective Set Calculation&lt;/strong&gt;: If the file's effective bit is set, the new effective set equals the full new permitted set, meaning all capabilities are immediately active. Otherwise the effective set starts empty and the process must raise them manually.

&lt;ul&gt;
&lt;li&gt;Formula: &lt;code&gt;New Effective = New Permitted if File Effective Flag is set, else 0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Inheritable Set Calculation&lt;/strong&gt;: The inheritable set is simply carried over unchanged from the old thread, &lt;code&gt;execve()&lt;/code&gt; does not modify it.

&lt;ul&gt;
&lt;li&gt;Formula: &lt;code&gt;New Inheritable = Old Inheritable&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The following diagram shows the relationship between the different capability sets and how they interact during process creation and execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpoth959k9kadtnxw2eo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpoth959k9kadtnxw2eo.png" alt=" " width="800" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You might notice a gap in the logic described above. If you wanted to run an ordinary binary or script with capabilities, say a plain Python script, you were stuck. Putting a capability in the inheritable set had no effect unless the target binary also had that capability in its file inheritable set, which meant you couldn't pass privileges down to unmodified binaries without touching the files themselves.&lt;/p&gt;

&lt;p&gt;The ambient set solves this. Any capability in the ambient set is automatically added to the new process's permitted and effective sets after &lt;code&gt;execve()&lt;/code&gt;, even if the binary has no file capabilities set at all. This is how modern container runtimes can run standard, unmodified applications with specific privileges without needing to alter the binaries inside the container image.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspecting &amp;amp; Manipulating Capabilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common Linux Capabilities
&lt;/h3&gt;

&lt;p&gt;Before we can meaningfully inspect anything, it helps to have a mental map of the most important capabilities and what they actually allow. Linux defines over 40 capabilities, but a handful appear constantly in security-relevant contexts. The &lt;a href="https://man7.org/linux/man-pages/man7/capabilities.7.html" rel="noopener noreferrer"&gt;full list can be found in the documentation&lt;/a&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Short Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_CHOWN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allows a process to make arbitrary changes to file UIDs and GIDs. &lt;code&gt;CAP_SETUID&lt;/code&gt; and &lt;code&gt;CAP_SETGID&lt;/code&gt; allow a process to change its own UID and GID, which is how su and sudo work. A process with &lt;code&gt;CAP_SETUID&lt;/code&gt; can effectively become any user on the system, including root, making it nearly as dangerous as having full root.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_DAC_OVERRIDE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stands for "Discretionary Access Control Override." A process with this capability can bypass standard file read, write, and execute permission checks. In practical terms, this means it can read or write any file on the system regardless of its ownership or permissions. It does not bypass MAC (Mandatory Access Control) systems like SELinux or AppArmor, but it completely defeats the traditional UNIX permission model.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allows a process to bind to privileged ports, those below 1024, without needing full root access. This is the correct and minimal capability to assign to a web server that needs to listen on port 80 or 443. Without it, only root processes can bind to these ports.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_NET_RAW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Grants the ability to use raw and packet sockets, and to bind to any address for transparent proxying. This is what the &lt;code&gt;ping&lt;/code&gt; command historically needed to craft ICMP packets. It is also what an attacker needs to perform packet sniffing or craft arbitrary network packets, making it a capability worth watching closely.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_NET_ADMIN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;This is one of the most powerful networking capabilities. It grants permission to perform a broad range of network configuration tasks: configuring network interfaces, managing routing tables, setting firewall rules with &lt;code&gt;iptables&lt;/code&gt;, enabling promiscuous mode on a network interface, and modifying network namespaces. Because it covers so much ground, it's a frequent target during container escapes. A container that has &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt; can potentially reconfigure the host's network stack if it escapes its namespace.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;This is often described as the "new root." It is by far the broadest capability in Linux, covering an enormous range of system administration operations: mounting and unmounting filesystems, managing namespaces via &lt;code&gt;clone()&lt;/code&gt; and &lt;code&gt;unshare()&lt;/code&gt;, loading kernel modules (in combination with &lt;code&gt;CAP_SYS_MODULE&lt;/code&gt;), performing &lt;code&gt;chroot()&lt;/code&gt;, and dozens of other privileged operations. The Linux man page lists so many permissions under &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; that security practitioners generally treat its presence in a container as equivalent to running the container as root. If you see it, treat it as a red flag.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_SYS_PTRACE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allows a process to trace arbitrary processes using &lt;code&gt;ptrace()&lt;/code&gt;, the system call that debuggers like &lt;code&gt;gdb&lt;/code&gt; rely on. In a container context, this is particularly dangerous because &lt;code&gt;ptrace()&lt;/code&gt; can be used to inspect and modify the memory of other processes, potentially leading to container escape if the target process runs in a different namespace or with higher privileges.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CAP_SYS_MODULE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allows loading and unloading kernel modules. This is an extremely high-risk capability because a kernel module runs in kernel space with no restrictions whatsoever. A process with this capability can load a malicious module that does anything the kernel itself can do.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Inspecting Capabilities
&lt;/h3&gt;

&lt;p&gt;Linux provides several tools for examining what capabilities are assigned, whether to a running process or to a file on disk. Using them together gives you a complete picture of the privilege landscape on any system.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The examples in the following sections are run on a standard Ubuntu Server 24.04 VM. Always run these exercises on a disposable test environment, as you may encounter binaries with capabilities that can be dangerous if misused.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Inspecting File Capabilities with &lt;code&gt;getcap&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;getcap&lt;/code&gt; command reads the &lt;code&gt;security.capability&lt;/code&gt; extended attribute from a file and displays it in a human-readable format. This is the primary tool for checking what privileges an executable binary has been granted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;getcap /usr/bin/ping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would typically see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/usr/bin/ping cap_net_raw=ep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells you that the &lt;code&gt;ping&lt;/code&gt; binary has the &lt;code&gt;cap_net_raw&lt;/code&gt; capability, and the &lt;code&gt;=ep&lt;/code&gt; suffix tells you which sets it's in. The letter &lt;code&gt;e&lt;/code&gt; means the &lt;strong&gt;Effective&lt;/strong&gt; flag is set, and &lt;code&gt;p&lt;/code&gt; means the capability is in the &lt;strong&gt;Permitted&lt;/strong&gt; file set. Referring back to our capability calculation formula, this means that when &lt;code&gt;ping&lt;/code&gt; is executed, &lt;code&gt;cap_net_raw&lt;/code&gt; will be added to the new process's permitted set, and because the effective flag is set, it will also be immediately active in the effective set.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Historical Note&lt;/strong&gt;: Due to an &lt;a href="https://hwchiu.medium.com/relearning-the-ping-command-13dbb6aeb5ba" rel="noopener noreferrer"&gt;update where "ping sockets" were added directly&lt;/a&gt; to the kernel, the &lt;code&gt;ping&lt;/code&gt; command technically no longer requires any additional Linux capabilities to work (though this is gated by a config setting disabled by some distros). The &lt;code&gt;CAP_NET_RAW&lt;/code&gt; capability is still commonly assigned to the binary for backward compatibility with older kernels and configurations where raw sockets are still required.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You'll commonly see these suffixes in capability strings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;p&lt;/code&gt;&lt;/strong&gt; - The capability is in the file's Permitted set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;e&lt;/code&gt;&lt;/strong&gt; - The file's Effective bit is set (applies all permitted caps to the effective set immediately).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;i&lt;/code&gt;&lt;/strong&gt; - The capability is in the file's Inheritable set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ep&lt;/code&gt;&lt;/strong&gt; - Both Effective and Permitted; the most common combination for binaries that need to self-elevate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One particularly useful flag for &lt;code&gt;getcap&lt;/code&gt; is &lt;code&gt;-r&lt;/code&gt;, which enables recursive searching. To scan an entire filesystem for any binary that has capabilities assigned, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;getcap &lt;span class="nt"&gt;-r&lt;/span&gt; / 2&amp;gt;/dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;2&amp;gt;/dev/null&lt;/code&gt; part discards permission errors from directories you can't read. This one-liner is a standard step in security audits and CTF (Capture the Flag) challenges alike, since a misconfigured binary with an overly broad capability is a common privilege escalation vector.&lt;/p&gt;

&lt;h4&gt;
  
  
  Inspecting Process Capabilities with &lt;code&gt;getpcaps&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;While &lt;code&gt;getcap&lt;/code&gt; deals with files, &lt;code&gt;getpcaps&lt;/code&gt; shows you the capabilities of a running process, identified by its PID. Let's look at the difference between a normal user process and a root process.&lt;/p&gt;

&lt;p&gt;First, find the PID of your current shell and inspect it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@container-security:~$ ps
    PID TTY          TIME CMD
   2736 pts/0    00:00:00 bash
   2755 pts/0    00:00:00 ps

user@container-security:~$ getpcaps 2736
2736: =
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;=&lt;/code&gt; output means the process has an empty capability set across all five sets. This is exactly what you'd expect for an ordinary shell running as a non-root user. It doesn't need any capabilities because it relies entirely on standard file permission checks for everything it does.&lt;/p&gt;

&lt;p&gt;Now compare that to a shell running as root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@container-security:~$ sudo bash
[sudo] password for user:
root@container-security:/home/user#
root@container-security:/home/user# ps
    PID TTY          TIME CMD
   2761 pts/1    00:00:00 sudo
   2762 pts/1    00:00:00 bash
   2769 pts/1    00:00:00 ps
root@container-security:/home/user# getpcaps 2762
2762: =ep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A root shell carries the full complement of capabilities (&lt;code&gt;=ep&lt;/code&gt;) in both its Permitted and Effective sets, giving it unconstrained access to virtually every privileged operation on the system. This is exactly the scenario that the Principle of Least Privilege is designed to avoid.&lt;/p&gt;

&lt;p&gt;One subtle and dangerous pitfall to be aware of is the &lt;strong&gt;empty capability set&lt;/strong&gt;. When you inspect such a process with &lt;code&gt;getpcaps&lt;/code&gt;, you'll see something like (what we got in the the root shell example above):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;PROCESS_PID&amp;gt;: =ep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks like the file has no specific capabilities, and one might assume it's harmless. It is the exact opposite. An empty capability set with the &lt;code&gt;ep&lt;/code&gt; flags means &lt;strong&gt;all&lt;/strong&gt; capabilities are enabled. The empty set before &lt;code&gt;=ep&lt;/code&gt; is shorthand for "all capabilities" making this the equivalent of &lt;code&gt;&amp;lt;PROCESS_PID&amp;gt;: all=ep&lt;/code&gt;. The same is true for files.&lt;/p&gt;

&lt;h4&gt;
  
  
  Inspecting the Current Shell with &lt;code&gt;capsh&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;capsh&lt;/code&gt; (Capability Shell) is a versatile tool for both inspecting and launching processes with specific capability sets. Its &lt;code&gt;--print&lt;/code&gt; flag dumps a comprehensive view of the current shell's capability state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;capsh &lt;span class="nt"&gt;--print&lt;/span&gt;
Current: &lt;span class="o"&gt;=&lt;/span&gt;
Bounding &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
Current IAB:
Securebits: 00/0x0/1&lt;span class="s1"&gt;'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1000(user) euid=1000(user)
gid=1000(user)
groups=27(sudo),1000(user)
Guessed mode: HYBRID (4)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output tells you several things at once. &lt;code&gt;Current&lt;/code&gt; is the thread's effective and permitted capability sets. &lt;code&gt;Bounding set&lt;/code&gt; shows the hard ceiling, notice that even for a non-root user, the bounding set may contain many capabilities, but they won't appear in the current set unless explicitly granted. &lt;code&gt;Ambient set&lt;/code&gt; is empty here, meaning no capabilities will be passed to child processes automatically.&lt;/p&gt;

&lt;p&gt;This is much richer than &lt;code&gt;getpcaps&lt;/code&gt; for understanding the full capability context of your current process.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reading Raw Bitmasks from &lt;code&gt;/proc&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;For low-level inspection or scripting, you can read capability information directly from the kernel's process filesystem. Every running process has a &lt;code&gt;status&lt;/code&gt; file under &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/status&lt;/code&gt; that contains raw hexadecimal bitmask values for each capability set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;ps
    PID TTY          TIME CMD
   2736 pts/0    00:00:00 bash
   3184 pts/0    00:00:00 ps
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /proc/2736/status | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line corresponds to a capability set: &lt;code&gt;CapInh&lt;/code&gt; (Inheritable), &lt;code&gt;CapPrm&lt;/code&gt; (Permitted), &lt;code&gt;CapEff&lt;/code&gt; (Effective), &lt;code&gt;CapBnd&lt;/code&gt; (Bounding), and &lt;code&gt;CapAmb&lt;/code&gt; (Ambient). The values are 64-bit hexadecimal bitmasks where each bit position corresponds to a specific capability number.&lt;/p&gt;

&lt;p&gt;Reading these raw masks directly isn't very human-friendly, but &lt;code&gt;capsh&lt;/code&gt; can decode them for you with the &lt;code&gt;--decode&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;capsh &lt;span class="nt"&gt;--decode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;000001ffffffffff
&lt;span class="nv"&gt;0x000001ffffffffff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially useful in automated scripts or when you need to understand the capabilities of a process that &lt;code&gt;getpcaps&lt;/code&gt; can't reach, such as inside a container's namespace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Assigning Capabilities with &lt;code&gt;setcap&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;setcap&lt;/code&gt; writes a capability set directly into the &lt;code&gt;security.capability&lt;/code&gt; extended attribute of a file. The general syntax is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap &amp;lt;capability&amp;gt;+&amp;lt;sets&amp;gt; /path/to/binary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, to grant a binary the &lt;code&gt;CAP_SETUID&lt;/code&gt; capability in both the Permitted and Effective sets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap cap_setuid+ep /path/to/file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that running &lt;code&gt;setcap&lt;/code&gt; itself requires the &lt;code&gt;CAP_SETFCAP&lt;/code&gt; capability. This privilege is automatically granted to root, which is why the &lt;code&gt;sudo&lt;/code&gt; prefix is needed when running as a normal user.&lt;/p&gt;

&lt;p&gt;An important subtlety: &lt;strong&gt;&lt;code&gt;setcap&lt;/code&gt; is not additive&lt;/strong&gt;. Each invocation of &lt;code&gt;setcap&lt;/code&gt; completely replaces the capability set of the file. If you want to assign multiple capabilities, you must specify all of them in a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap cap_net_bind_service,cap_net_raw+ep /path/to/binary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;setcap&lt;/code&gt; twice with different capabilities will result in only the second set being stored.&lt;/p&gt;

&lt;h3&gt;
  
  
  Removing Capabilities with &lt;code&gt;setcap -r&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;To strip all capabilities from a file, use the &lt;code&gt;-r&lt;/code&gt; (remove) flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap &lt;span class="nt"&gt;-r&lt;/span&gt; /path/to/program
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, &lt;code&gt;getcap&lt;/code&gt; on that file will return no output, and the binary will run with whatever privileges the executing user's process has, just like any other ordinary binary.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Practical Example: Assigning &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; to a Custom Binary
&lt;/h3&gt;

&lt;p&gt;On Linux, ports below 1024 are called privileged ports. Binding to them is restricted by the kernel to prevent unprivileged users from impersonating well-known services like HTTP (port 80) or HTTPS (port 443). Traditionally, the only way to bind to these ports was to run your process as root. With &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; we can grant exactly that one permission to a specific binary, and nothing else.&lt;/p&gt;

&lt;p&gt;Try to start a Python HTTP server on port 80 as a non-root user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 80
Traceback &lt;span class="o"&gt;(&lt;/span&gt;most recent call last&lt;span class="o"&gt;)&lt;/span&gt;:
  File &lt;span class="s2"&gt;"&amp;lt;frozen runpy&amp;gt;"&lt;/span&gt;, line 198, &lt;span class="k"&gt;in &lt;/span&gt;_run_module_as_main
  File &lt;span class="s2"&gt;"&amp;lt;frozen runpy&amp;gt;"&lt;/span&gt;, line 88, &lt;span class="k"&gt;in &lt;/span&gt;_run_code
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/http/server.py"&lt;/span&gt;, line 1314, &lt;span class="k"&gt;in&lt;/span&gt; &amp;lt;module&amp;gt;
    &lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/http/server.py"&lt;/span&gt;, line 1261, &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;test
    &lt;/span&gt;with ServerClass&lt;span class="o"&gt;(&lt;/span&gt;addr, HandlerClass&lt;span class="o"&gt;)&lt;/span&gt; as httpd:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/socketserver.py"&lt;/span&gt;, line 457, &lt;span class="k"&gt;in &lt;/span&gt;__init__
    self.server_bind&lt;span class="o"&gt;()&lt;/span&gt;
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/http/server.py"&lt;/span&gt;, line 1308, &lt;span class="k"&gt;in &lt;/span&gt;server_bind
    &lt;span class="k"&gt;return &lt;/span&gt;super&lt;span class="o"&gt;()&lt;/span&gt;.server_bind&lt;span class="o"&gt;()&lt;/span&gt;
           ^^^^^^^^^^^^^^^^^^^^^
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/http/server.py"&lt;/span&gt;, line 136, &lt;span class="k"&gt;in &lt;/span&gt;server_bind
    socketserver.TCPServer.server_bind&lt;span class="o"&gt;(&lt;/span&gt;self&lt;span class="o"&gt;)&lt;/span&gt;
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/socketserver.py"&lt;/span&gt;, line 473, &lt;span class="k"&gt;in &lt;/span&gt;server_bind
    self.socket.bind&lt;span class="o"&gt;(&lt;/span&gt;self.server_address&lt;span class="o"&gt;)&lt;/span&gt;
PermissionError: &lt;span class="o"&gt;[&lt;/span&gt;Errno 13] Permission denied
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The kernel blocks it immediately. Checking the capabilities of the Python binary confirms that it has no special permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;getcap /usr/bin/python3.12 &lt;span class="c"&gt;# empty output, no capabilities assigned, so binding to a privileged port is forbidden&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Port 1024 and above work fine without any capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 8080
Serving HTTP on 0.0.0.0 port 8080 &lt;span class="o"&gt;(&lt;/span&gt;http://0.0.0.0:8080/&lt;span class="o"&gt;)&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This confirms the problem is specifically about privileged ports, not Python itself.&lt;/p&gt;

&lt;p&gt;Assign &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; to the Python Binary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;which python3
/usr/bin/python3
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;readlink&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/bin/python3
/usr/bin/python3.12
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap cap_net_bind_service+ep /usr/bin/python3.12
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;getcap /usr/bin/python3.12
/usr/bin/python3.12 &lt;span class="nv"&gt;cap_net_bind_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And confirm the file permissions are completely unchanged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; /usr/bin/python3.12
&lt;span class="nt"&gt;-rwxr-xr-x&lt;/span&gt; 1 root root 8020928 Jan 22 20:57 /usr/bin/python3.12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SUID bit. No ownership change. Nothing visible to a &lt;code&gt;ls&lt;/code&gt; check.&lt;/p&gt;

&lt;p&gt;Confirm It Works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 80
Serving HTTP on 0.0.0.0 port 80 &lt;span class="o"&gt;(&lt;/span&gt;http://0.0.0.0:80/&lt;span class="o"&gt;)&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It binds successfully. From another terminal, verify the running process has exactly one capability and nothing more:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;pgrep python3
3257
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;getpcaps 3257
3257: &lt;span class="nv"&gt;cap_net_bind_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ep
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /proc/3257/status | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; cap
CapInh: 0000000000000000
CapPrm: 0000000000000400
CapEff: 0000000000000400
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
user@container-security:~&lt;span class="err"&gt;$&lt;/span&gt;
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;capsh &lt;span class="nt"&gt;--decode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0000000000000400 &lt;span class="c"&gt;# Decode the bitmask to confirm&lt;/span&gt;
&lt;span class="nv"&gt;0x0000000000000400&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cap_net_bind_service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bit 10 (&lt;code&gt;0x400&lt;/code&gt;) is &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; and nothing else. The process cannot read arbitrary files, cannot change file ownership, cannot kill other processes. It can only bind to privileged ports.&lt;/p&gt;

&lt;p&gt;To remove the capability and confirm it no longer works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap &lt;span class="nt"&gt;-r&lt;/span&gt; /usr/bin/python3.12
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 80
Traceback &lt;span class="o"&gt;(&lt;/span&gt;most recent call last&lt;span class="o"&gt;)&lt;/span&gt;:
  ...
  File &lt;span class="s2"&gt;"/usr/lib/python3.12/socketserver.py"&lt;/span&gt;, line 473, &lt;span class="k"&gt;in &lt;/span&gt;server_bind
    self.socket.bind&lt;span class="o"&gt;(&lt;/span&gt;self.server_address&lt;span class="o"&gt;)&lt;/span&gt;
PermissionError: &lt;span class="o"&gt;[&lt;/span&gt;Errno 13] Permission denied
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Capabilities Security Implications
&lt;/h2&gt;

&lt;p&gt;While capabilities were designed to implement the Principle of Least Privilege and secure your system, they can become a massive liability if misconfigured. Assigning the wrong capability to the wrong binary effectively hands an attacker a clean, built-in mechanism for privilege escalation.&lt;/p&gt;

&lt;p&gt;As we saw in the previous chapter, an empty capability set assigned with the Effective and Permitted flags (&lt;code&gt;=ep&lt;/code&gt;) is actually shorthand for granting all available capabilities. If a system administrator mistakenly applies this or even just a specific capability like &lt;code&gt;CAP_SETUID&lt;/code&gt; to a script interpreter or a common binary, the entire security model collapses.&lt;/p&gt;

&lt;p&gt;Let's look at how easily an attacker can exploit this using Python. Assume an administrator accidentally ran &lt;code&gt;sudo setcap =ep /usr/bin/python3.12&lt;/code&gt; (or &lt;code&gt;sudo setcap cap_setuid+ep /usr/bin/python3.12&lt;/code&gt;) while trying to fix a permissions issue. In that scenario, escalating privileges to root becomes trivial. All an attacker needs to do is write a one-liner to change their User ID (UID) to root and spawn a shell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap cap_setuid+ep /usr/bin/python3.12
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;getcap /usr/bin/python3.12
/usr/bin/python3.12 &lt;span class="nv"&gt;cap_setuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ep
user@container-security:~&lt;span class="err"&gt;$&lt;/span&gt;
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'import os; os.setuid(0); os.execl("/bin/bash", "bash")'&lt;/span&gt;
root@container-security:~# &lt;span class="c"&gt;# notice the prompt changed to root, we are now running a root shell&lt;/span&gt;
root@container-security:~# &lt;span class="nb"&gt;exit
exit
&lt;/span&gt;user@container-security:~&lt;span class="err"&gt;$&lt;/span&gt;
&lt;span class="c"&gt;# remove the capability to prevent this from happening again&lt;/span&gt;
user@container-security:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;setcap &lt;span class="nt"&gt;-r&lt;/span&gt; /usr/bin/python3.12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break down exactly what is happening here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;python3 -c&lt;/code&gt;&lt;/strong&gt;: Tells the Python interpreter to execute the following inline code string.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;import os&lt;/code&gt;&lt;/strong&gt;: Imports the standard OS module required to make system calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;os.setuid(0)&lt;/code&gt;&lt;/strong&gt;: Leverages the &lt;code&gt;CAP_SETUID&lt;/code&gt; capability to change the process's effective UID to 0, which is the root user.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;os.execl("/bin/bash", "bash")&lt;/code&gt;&lt;/strong&gt;: Replaces the current Python process with a brand-new bash shell. Because the UID was just changed to 0, this new shell runs entirely as root.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just like that, a standard user account is transformed into a superuser, bypassing all traditional access controls.&lt;/p&gt;

&lt;p&gt;The privilege escalation scenarios above become significantly more consequential in a containerized environment, and this is a topic we will explore in depth in a dedicated chapter. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;series introduction post provides the complete roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>containers</category>
      <category>cybersecurity</category>
      <category>linux</category>
      <category>docker</category>
    </item>
    <item>
      <title>Chapter 3: Linux File Permissions</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Thu, 05 Feb 2026 14:45:21 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/chapter-3-linux-file-permissions-34ac</link>
      <guid>https://forem.com/0xalphasecurity/chapter-3-linux-file-permissions-34ac</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post is part of the Ultimate Container Security Series, a structured, multi-part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;see the series introduction post here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next, we will explore how Linux manages file permissions to ensure security and proper access control. File permissions are a fundamental aspect of Linux security, determining who can read, write, or execute files and directories, whether on a local system or within a containerized environment. In Linux everything is a file (program code, configuration, hardware devices, etc.), so understanding file permissions is crucial for managing system security effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Linux file permissions
&lt;/h2&gt;

&lt;p&gt;Linux is built as a multi-user environment, where security of user data and system integrity is very important. Sometimes the efficient file security built into Linux can create problems for users and administrators who are not familiar with how it works.&lt;/p&gt;

&lt;p&gt;File permissions have 3 basic components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User&lt;/strong&gt;: The owner of the file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Group&lt;/strong&gt;: A set of users who share access permissions. Groups are used for better administration control. Each user will belong to at least one default group.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Others&lt;/strong&gt;: Everyone else who is not the owner or in the group.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When we create a file and check its permissions using the &lt;code&gt;ls -l&lt;/code&gt; command, we see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffl7qd11kmjcemgozfrin.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffl7qd11kmjcemgozfrin.png" alt=" " width="800" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we break down the output column by column:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;first column&lt;/strong&gt; shows the file type and permissions. The first character indicates the file type (&lt;code&gt;-&lt;/code&gt; for regular files, &lt;code&gt;d&lt;/code&gt; for directories, etc.). The next nine characters are the file permissions. The permissions are divided into three sets of three characters. First set is for the owner, second set is for the group, and the third set is for others. There are 3 possible attributes that make up file access permissions.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;r&lt;/code&gt; - Read permission. Whether the file may be read. In the case of a directory, this would mean the ability to list the contents of the directory.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;w&lt;/code&gt; - Write permission. Whether the file may be written to or modified. For a directory, this defines whether you can make any changes to the contents of the directory. If write permission is not set then you will not be able to delete, rename or create a file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt; - Execute permission. Whether the file may be executed. In the case of a directory, this attribute decides whether you have permission to enter, run a search through that directory or execute some program from that directory.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The &lt;strong&gt;second column&lt;/strong&gt; indicates the number of hard links to the file.&lt;/li&gt;

&lt;li&gt;The &lt;strong&gt;third column&lt;/strong&gt; shows the owner of the file.&lt;/li&gt;

&lt;li&gt;The &lt;strong&gt;fourth column&lt;/strong&gt; shows the group associated with the file.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Let's look at our 3 basic examples from above:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;drwxrwxr-x 2 max max 4096 Jan 13 23:04 documents&lt;/code&gt;: This is a directory (&lt;code&gt;d&lt;/code&gt; at the start). The owner &lt;code&gt;max&lt;/code&gt; has read, write, and execute permissions (&lt;code&gt;rwx&lt;/code&gt;). The group &lt;code&gt;max&lt;/code&gt; also has read, write, and execute permissions (&lt;code&gt;rwx&lt;/code&gt;). Others have read and execute permissions (&lt;code&gt;r-x&lt;/code&gt;), but not write permission.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-rw-rw-r-- 1 max administrator 0 Jan 13 22:49 myfile&lt;/code&gt;: This is a regular file (&lt;code&gt;-&lt;/code&gt; at the start). The owner &lt;code&gt;max&lt;/code&gt; has read and write permissions (&lt;code&gt;rw-&lt;/code&gt;). The group &lt;code&gt;administrator&lt;/code&gt; also has read and write permissions (&lt;code&gt;rw-&lt;/code&gt;). Others have only read permission (&lt;code&gt;r--&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-rwxrwxr-x 1 max max 0 Jan 13 23:05 script.sh&lt;/code&gt;: This is a regular file (&lt;code&gt;-&lt;/code&gt; at the start). The owner &lt;code&gt;max&lt;/code&gt; has read, write, and execute permissions (&lt;code&gt;rwx&lt;/code&gt;). The group &lt;code&gt;max&lt;/code&gt; also has read, write, and execute permissions (&lt;code&gt;rwx&lt;/code&gt;). Others have read and execute permissions (&lt;code&gt;r-x&lt;/code&gt;), but not write permission.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Changing File Permissions
&lt;/h2&gt;

&lt;p&gt;Now that we understand how to read file permissions, let's look at how to change them. The main command used to modify permissions in Linux is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To change file permissions, you must be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the owner of the file, or&lt;/li&gt;
&lt;li&gt;the root (superuser).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Permissions can be changed in two main ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Symbolic mode (letters and operators)&lt;/strong&gt;: more readable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numeric mode (octal numbers)&lt;/strong&gt;: faster and widely used in scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Symbolic Mode (Using Letters)
&lt;/h3&gt;

&lt;p&gt;Permissions can be defined for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;u&lt;/code&gt;: user (owner)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;g&lt;/code&gt;: group&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;o&lt;/code&gt;: others&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt;: all (user + group + others)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Operators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;+&lt;/code&gt;: add permission&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-&lt;/code&gt;: remove permission&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;=&lt;/code&gt;: set exactly (overwrite existing permissions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Permission bits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;r&lt;/code&gt;: read&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;w&lt;/code&gt;: write&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt;: execute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's look at a basic symbolic mode example where we remove the execute permission from the user, add write permission to the group, and set read and write permission for others:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# BEFORE: -rwxr--r-- 1 max max 0 Jan 13 23:05 script.sh&lt;/span&gt;

&lt;span class="nb"&gt;chmod &lt;/span&gt;u-x,g+w,o+rw script.sh

&lt;span class="c"&gt;# AFTER: -rw-rw-rw- 1 max max 0 Jan 13 23:05 script.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are a few more practical examples of using symbolic mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) EXAMPLE: Removes write and execute from group.&lt;/span&gt;

&lt;span class="c"&gt;# BEFORE: -rwxrwxrwx&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;g-wx somefile
&lt;span class="c"&gt;# AFTER: -rwxr--rwx&lt;/span&gt;

&lt;span class="c"&gt;# 2) EXAMPLE: Give execute permission to everyone.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;a+x somefile &lt;span class="c"&gt;# (Equivalent to chmod +x somefile)&lt;/span&gt;

&lt;span class="c"&gt;# 3) EXAMPLE: Apply same change to group and others together.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;go-rx somefile

&lt;span class="c"&gt;# 4) EXAMPLE: Set user and group permissions exactly to rwx, removing anything else.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;&lt;span class="nv"&gt;ug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rwx somefile

&lt;span class="c"&gt;# 5) EXAMPLE: Copy permissions from another class. Others will receive the same permissions that group currently has.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;&lt;span class="nv"&gt;o&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;g somefile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Using &lt;code&gt;chmod +x&lt;/code&gt; is a common way to make a script executable. But this gives execute permission to everyone. If you want to give execute permission only to the owner, use &lt;code&gt;chmod u+x&lt;/code&gt; instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Numeric Mode (Using Octal Numbers)
&lt;/h3&gt;

&lt;p&gt;Linux also allows permissions to be set using numbers. This is called octal mode and is very common in administration and scripting.&lt;/p&gt;

&lt;p&gt;Each permission has a numeric value:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;r&lt;/code&gt; = 4&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;w&lt;/code&gt; = 2&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt; = 1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add the values together to get the permission number. For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Permissions&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;rwx&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;rw-&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;r-x&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;r--&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-wx&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-w-&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;--x&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;---&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The syntax for using numeric mode is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;XYZ filename
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;X&lt;/code&gt; is the permission for the user, &lt;code&gt;Y&lt;/code&gt; is the permission for the group, and &lt;code&gt;Z&lt;/code&gt; is the permission for others.&lt;/p&gt;

&lt;p&gt;A few common examples of using numeric mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) EXAMPLE: Owner can read/write, everyone else can read only.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;644 somefile

&lt;span class="c"&gt;# 2) EXAMPLE: Owner can read/write/execute; others can read and execute. This is very common for executable scripts and programs.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;755 somefile

&lt;span class="c"&gt;# 3) EXAMPLE: Private file: only owner can read/write, no permissions for group and others.&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;600 secret.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ Avoid setting permissions like 777 unless absolutely necessary. Giving everyone full access is convenient but unsafe. Even on a personal system, good permission habits prevent accidental damage and security issues.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Changing File Ownership
&lt;/h2&gt;

&lt;p&gt;In addition to permissions, Linux also allows you to change the ownership of files and directories. This is done using the &lt;code&gt;chown&lt;/code&gt; command. Only root (or sudo) can change file ownership in most systems.&lt;/p&gt;

&lt;p&gt;The syntax for &lt;code&gt;chown&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chown &lt;/span&gt;newuser somefile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also change the group ownership using the &lt;code&gt;chgrp&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chgrp &lt;/span&gt;newgroup somefile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding the setuid bit
&lt;/h2&gt;

&lt;p&gt;In addition to the standard read (&lt;code&gt;r&lt;/code&gt;), write (&lt;code&gt;w&lt;/code&gt;), and execute (&lt;code&gt;x&lt;/code&gt;) permissions, Linux supports three special permission bits that modify how files and directories behave:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;setuid (Set User ID)&lt;/strong&gt;: The setuid bit applies to executable files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;setgid (Set Group ID)&lt;/strong&gt;: The setgid bit changes group behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sticky bit&lt;/strong&gt;: The sticky bit applies mainly to directories. It controls who can delete files inside a writable directory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These special bits are powerful and commonly used on multi-user systems. They enable controlled privilege elevation and safer collaboration,  but when misused, they can create serious security risks.&lt;/p&gt;

&lt;p&gt;The focus of this section will be on the &lt;strong&gt;setuid&lt;/strong&gt; bit, as it is the most relevant to understanding privilege escalation risks in Linux. When you run an executable file, the process that gets created inherits your user ID (the current user of the shell).&lt;/p&gt;

&lt;p&gt;Some programs need temporary elevated privileges to perform specific tasks. This is where the setuid bit comes into play. &lt;strong&gt;When the setuid bit is set on an executable file, it causes a program to run with the effective user ID  of the file’s owner&lt;/strong&gt;. This allows a regular user to run a specific program with elevated privileges, if the program owner is root.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We advice to run this example in a disposable Ubuntu 24 Docker container. Run the container with: &lt;code&gt;docker run --rm -it ubuntu:24.04 bash -lc "set -e; apt update; apt install -y sudo nano build-essential; useradd -m -s /bin/bash test; usermod -aG sudo test; echo 'test ALL=(ALL) NOPASSWD:ALL' &amp;gt; /etc/sudoers.d/test; chmod 440 /etc/sudoers.d/test; exec su - test"&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's look at an common example of a SetUID program: the &lt;code&gt;passwd&lt;/code&gt; command. The &lt;code&gt;passwd&lt;/code&gt; command allows users to change their passwords, but updating the password file requires root privileges. To allow regular users to change their passwords, the &lt;code&gt;passwd&lt;/code&gt; executable has the setuid bit set and is owned by root.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="sb"&gt;`&lt;/span&gt;which passwd&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;span class="nt"&gt;-rwsr-xr-x&lt;/span&gt; 1 root root 64152 May 30  2024 /usr/bin/passwd
&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="err"&gt;$&lt;/span&gt;
&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; /usr/bin/passwd ./mypasswd
&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="err"&gt;$&lt;/span&gt;
&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; mypasswd
&lt;span class="nt"&gt;-rwxr-xr-x&lt;/span&gt; 1 &lt;span class="nb"&gt;test test &lt;/span&gt;64152 Feb  5 13:55 mypasswd
&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you copy the &lt;code&gt;passwd&lt;/code&gt; command to your home directory, the setuid bit is not preserved, and the file is owned by your user (&lt;code&gt;test&lt;/code&gt;). Therefore, when you run &lt;code&gt;./mypasswd&lt;/code&gt;, it runs with your user privileges, not root and the command will not work as intended.&lt;/p&gt;

&lt;p&gt;To demonstrate both “SetUID changes privileges” and why SetUID programs are risky we will create a simple SetUID demo program. That tiny program will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;print real UID vs effective UID&lt;/li&gt;
&lt;li&gt;try a root-only action (write into /root/...)&lt;/li&gt;
&lt;li&gt;sleep so we can inspect the running process with ps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a root-only target file&lt;/strong&gt;: This is the file our demo program will try to write into. Only root should have access to it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run: &lt;code&gt;sudo bash -lc 'echo "TOP SECRET" &amp;gt; /root/secret.txt &amp;amp;&amp;amp; chmod 600 /root/secret.txt &amp;amp;&amp;amp; ls -l /root/secret.txt'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Expected output:
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-rw------- 1 root root 11 Feb  5 14:02 /root/secret.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a SetUID demo program&lt;/strong&gt;: Open the file in the nano text editor: &lt;code&gt;nano suid_demo.c&lt;/code&gt; and paste the following code:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;uid_t&lt;/span&gt; &lt;span class="n"&gt;ruid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getuid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;uid_t&lt;/span&gt; &lt;span class="n"&gt;euid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;geteuid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Real UID: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ruid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Effective UID: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;euid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Attempt a root-only action: append to a file inside /root&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/secret.txt"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_WRONLY&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;O_APPEND&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"open(%s) failed: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Appended by suid_demo&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"write() failed: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SUCCESS: appended to %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sleeping %d seconds so you can inspect me with ps...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;fflush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compile it&lt;/strong&gt;: &lt;code&gt;gcc suid_demo.c -o suid_demo&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run it normally&lt;/strong&gt;: &lt;code&gt;./suid_demo 60&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;You should see output similar to this&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test@d4f8d29c759d:~$ id
uid=1001(test) gid=1001(test) groups=1001(test),27(sudo)

test@d4f8d29c759d:~$ ./suid_demo 60
Real UID: 1001
Effective UID: 1001
open(/root/secret.txt) failed: Permission denied
Sleeping 60 seconds so you can inspect me with ps...
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We can see that both Real UID and Effective UID are the same (1001, our user), and the attempt to open &lt;code&gt;/root/secret.txt&lt;/code&gt; failed due to permission denied.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If running the &lt;code&gt;ps ajf&lt;/code&gt; command from another terminal (while the program is running) we can see the following output (The last line shows our &lt;code&gt;suid_demo&lt;/code&gt; process running with UID 1001.):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@d4f8d29c759d:/# ps ajf
PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0  1774  1774  1774 pts/1     1783 Ss       0   0:00 bash
1774  1783  1783  1774 pts/1     1783 R+       0   0:00  \_ ps ajf
    0     1     1     1 pts/0     1782 Ss       0   0:00 su - test
    1  1746  1746     1 pts/0     1782 S     1001   0:00 -bash
1746  1782  1782     1 pts/0     1782 S+    1001   0:00  \_ ./suid_demo 60
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Turn it into a SetUID-root binary&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run these commands:
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo chown &lt;/span&gt;root:root suid_demo
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;4755 suid_demo &lt;span class="c"&gt;# set the setuid bit&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; suid_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We should now see that the owner is root and the permissions show an &lt;code&gt;s&lt;/code&gt; in place of the user execute bit:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test@d4f8d29c759d:~$ ls -l suid_demo
-rwsr-xr-x 1 root root 16488 Feb  5 14:06 suid_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run it again as a normal user: &lt;code&gt;./suid_demo 60&lt;/code&gt;&lt;/strong&gt;: The script runs again, but this time successfully appends to &lt;code&gt;/root/secret.txt&lt;/code&gt;. We can see the effective UID is now 0 (root), but the real UID is still our user (1001).:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test@d4f8d29c759d:~$ ./suid_demo 60
Real UID: 1001
Effective UID: 0
SUCCESS: appended to /root/secret.txt
Sleeping 60 seconds so you can inspect me with ps...
test@d4f8d29c759d:~$
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Running &lt;code&gt;ps ajf&lt;/code&gt; from another terminal confirms the effective UID is 0:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@d4f8d29c759d:/# ps ajf
PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0  1774  1774  1774 pts/1     1793 Ss       0   0:00 bash
1774  1793  1793  1774 pts/1     1793 R+       0   0:00  \_ ps ajf
    0     1     1     1 pts/0     1792 Ss       0   0:00 su - test
    1  1746  1746     1 pts/0     1792 S     1001   0:00 -bash
1746  1792  1792     1 pts/0     1792 S+       0   0:00  \_ ./suid_demo 60
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This exmaple confirms that the setuid bit allowed our program to run with the privileges of the file owner (root), even though we executed it as a normal user. Look inside the &lt;code&gt;sudo cat /root/secret.txt&lt;/code&gt; file to verify the append worked:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;test&lt;/span&gt;@d4f8d29c759d:~&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo cat&lt;/span&gt; /root/secret.txt
TOP SECRET
Appended by suid_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This simple demo shows why SetUID-root binaries are treated as high-risk and must be written extremely carefully. A SetUID program runs with the effective user ID of the file owner, rather than the calling user. When the file owner is root, this results in privilege elevation, which is why SetUID-root programs are security sensitive. If such a program contains a bug, for example unsafe input handling, path confusion, or command execution issues, it can become a privilege-escalation vector if the file is owned by root.&lt;/p&gt;

&lt;p&gt;The setuid bit comes from a time when Linux privilege management was simpler and more coarse-grained. The basic model was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;root → full privileges&lt;/li&gt;
&lt;li&gt;non-root → very limited privileges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SetUID was introduced as a mechanism to let non-root users perform specific privileged operations through carefully controlled programs. Starting with Linux kernel 2.2, more advanced security mechanisms were introduced, most notably Linux capabilities. &lt;/p&gt;

&lt;p&gt;Capabilities break the all-powerful root privilege into many smaller, specific privileges that can be granted independently. This follows the principle of least privilege: give a program only the exact permissions it needs, nothing more.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Historically, the &lt;code&gt;ping&lt;/code&gt; command required the setuid bit because it needed to open raw network sockets, which is a privileged operation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SetUID is still used for some core system tools, but it should be considered a legacy elevation mechanism and applied only when absolutely necessary.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;series introduction post provides the complete roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>cybersecurity</category>
      <category>linux</category>
    </item>
    <item>
      <title>Chapter 2: Linux System Calls</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Wed, 07 Jan 2026 21:54:02 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/chapter-2-linux-system-calls-1eb</link>
      <guid>https://forem.com/0xalphasecurity/chapter-2-linux-system-calls-1eb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post is part of the Ultimate Container Security Series, a structured, multi-part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;see the series introduction post here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To understand how containers work, and how to secure them, it helps to know a few Linux fundamentals. One of these fundamentals is Linux system calls. Later chapters will build on this to explain how containers provide isolation, resource management, and security boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Linux System Calls?
&lt;/h2&gt;

&lt;p&gt;Linux splits execution into two main "worlds":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Userspace&lt;/strong&gt;:  Userspace is where &lt;em&gt;user-facing applications run&lt;/em&gt;: web servers, Chrome, text editors, command-line tools, background services, etc. It's a restricted zone: applications cannot directly access hardware or manage critical system resources on their own. This restriction improves stability: if an application crashes, it usually doesn't crash the whole OS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Space&lt;/strong&gt;: Kernel space is where the Linux kernel &lt;em&gt;runs the core of the operating system&lt;/em&gt;. The kernel controls everything: memory, processes, scheduling, hardware and drivers, filesystems, networking, security, and more. It also interacts directly with the CPU, RAM, disk, and other hardware with full privileges.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x8nw4nbi3l3pvnm3fk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x8nw4nbi3l3pvnm3fk6.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So where do system calls fit in?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Applications run in userspace with lower privileges. If an application wants to do something that requires kernel privileges, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;opening a file&lt;/li&gt;
&lt;li&gt;reading/writing data&lt;/li&gt;
&lt;li&gt;creating a process&lt;/li&gt;
&lt;li&gt;allocating memory&lt;/li&gt;
&lt;li&gt;sending network traffic&lt;/li&gt;
&lt;li&gt;getting the current time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…it must ask the kernel to do it.&lt;/p&gt;

&lt;p&gt;That request is made through the system call interface, also called the &lt;code&gt;syscall&lt;/code&gt; interface.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Definition (in plain terms):&lt;/strong&gt; A system call is a programmatic way for a user-space application to request a service from the Linux kernel, safely and in a controlled way.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqn6fve5q57aiqy6vc4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqn6fve5q57aiqy6vc4k.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This distinction exists for security and stability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User programs can't directly touch hardware or kernel memory because that would be dangerous.&lt;/li&gt;
&lt;li&gt;System calls provide controlled entry points into the kernel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, not everything needs the kernel. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokenizing a string happens entirely in userspace.&lt;/li&gt;
&lt;li&gt;But anything involving files, devices, networking, or process management requires &lt;code&gt;syscalls&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Linux has 300+ system calls (the exact number varies by kernel version and CPU architecture). A few examples of common system calls:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What the program wants&lt;/th&gt;
&lt;th&gt;System call&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read a file&lt;/td&gt;
&lt;td&gt;&lt;code&gt;read()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write a file&lt;/td&gt;
&lt;td&gt;&lt;code&gt;write()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open a file&lt;/td&gt;
&lt;td&gt;&lt;code&gt;open()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start a new program&lt;/td&gt;
&lt;td&gt;&lt;code&gt;execve()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create a process&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fork()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allocate memory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mmap()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Send network data&lt;/td&gt;
&lt;td&gt;&lt;code&gt;send()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get current time&lt;/td&gt;
&lt;td&gt;&lt;code&gt;clock_gettime()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can browse the full list via the man page: &lt;a href="https://man7.org/linux/man-pages/man2/syscalls.2.html" rel="noopener noreferrer"&gt;syscalls(2)&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How do System Calls Work?
&lt;/h2&gt;

&lt;p&gt;At a high level, a syscall looks like a normal function call from the programmer's perspective but under the hood it performs a controlled transition into kernel mode.&lt;/p&gt;

&lt;p&gt;Typical flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user application calls a standard library function (for example &lt;code&gt;read()&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;That function triggers a system call using a system call number.&lt;/li&gt;
&lt;li&gt;The CPU switches from user mode to kernel mode.&lt;/li&gt;
&lt;li&gt;The Linux kernel executes the requested operation.&lt;/li&gt;
&lt;li&gt;Control returns to the application with a result (or an error).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example idea: calling &lt;code&gt;read(fd, buffer, size)&lt;/code&gt; triggers the kernel's read implementation for that file descriptor and returns the number of bytes read (or &lt;code&gt;-1&lt;/code&gt; on error, with details stored in &lt;code&gt;errno&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkb63u8l47ufhvm0u3a5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkb63u8l47ufhvm0u3a5.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Small Example in C
&lt;/h2&gt;

&lt;p&gt;As an application developer, you rarely need to invoke syscalls "raw." Usually you use higher-level abstractions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In C/C++: glibc provides wrapper functions (like &lt;code&gt;read()&lt;/code&gt;, &lt;code&gt;write()&lt;/code&gt;, &lt;code&gt;open()&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;In Go: you may encounter the &lt;code&gt;syscall&lt;/code&gt; package&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These wrappers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;validate and arrange arguments,&lt;/li&gt;
&lt;li&gt;perform the transition to kernel mode,&lt;/li&gt;
&lt;li&gt;return the result in a familiar way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a minimal C example that uses &lt;code&gt;write()&lt;/code&gt; to print to standard output (file descriptor 1):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Hello, World!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's happening step-by-step?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;write(1, msg, sizeof(msg) - 1)&lt;/code&gt; is called from userspace.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;write()&lt;/code&gt; (from glibc) is a wrapper that prepares the syscall.&lt;/li&gt;
&lt;li&gt;The process enters the kernel through the syscall interface.&lt;/li&gt;
&lt;li&gt;The kernel validates:

&lt;ul&gt;
&lt;li&gt;that file descriptor 1 is valid,&lt;/li&gt;
&lt;li&gt;that the process is allowed to write to it,&lt;/li&gt;
&lt;li&gt;that the buffer points to accessible memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The kernel writes the bytes to stdout (often your terminal).&lt;/li&gt;
&lt;li&gt;The kernel returns the number of bytes written, and execution continues in userspace.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even though the code looks simple, the important takeaway is this:&lt;br&gt;
&lt;strong&gt;any time you interact with files, processes, networking, memory mapping, etc., you're going through system calls.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Containers and System Calls
&lt;/h2&gt;

&lt;p&gt;A key point that many people miss early on: &lt;strong&gt;Containers are just processes running on the host Linux kernel.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means containers don't have a separate kernel. They share the host kernel, and system calls are the only way container processes interact with that kernel.&lt;/p&gt;

&lt;p&gt;So everything a container does, reading files, opening sockets, creating processes, flows through syscalls.&lt;/p&gt;

&lt;p&gt;The application code uses syscalls the same way whether it runs on the host or inside a container. But containers introduce security implications, because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The container still depends on the host kernel.&lt;/li&gt;
&lt;li&gt;If a process can access powerful syscalls, it may be able to do powerful things.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where least privilege matters: &lt;strong&gt;Not all applications need all system calls.&lt;/strong&gt; By restricting which syscalls a containerized application can use, you reduce the attack surface. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;System calls are the "front door" into the kernel. Since containers are just Linux processes sharing the host kernel, every action a container takes ultimately becomes a syscall. That makes syscalls a powerful security control point: if an attacker compromises a containerized app, the damage they can do depends heavily on which syscalls and privileges that process is allowed to use.&lt;/p&gt;

&lt;p&gt;This is why container hardening often focuses on reducing kernel exposure, using least privilege and Linux controls like &lt;code&gt;seccomp&lt;/code&gt; (restricting syscalls), &lt;code&gt;capabilities&lt;/code&gt; (dropping unnecessary privileges), and &lt;code&gt;namespaces&lt;/code&gt;/&lt;code&gt;cgroups&lt;/code&gt; (isolation and resource limits). In later chapters, we'll build directly on this idea to show how containers create boundaries, and how to tighten them.&lt;/p&gt;

&lt;p&gt;Few more resources to learn about Linux system calls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://brennan.io/2016/11/14/kernel-dev-ep3/" rel="noopener noreferrer"&gt;Tutorial - Write a System Call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.packagecloud.io/the-definitive-guide-to-linux-system-calls/" rel="noopener noreferrer"&gt;The Definitive Guide to Linux System Calls&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;series introduction post provides the complete roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>cybersecurity</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Chapter 1: Container Security Threat Model</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Sun, 04 Jan 2026 17:45:51 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/chapter-1-container-security-threat-model-3knd</link>
      <guid>https://forem.com/0xalphasecurity/chapter-1-container-security-threat-model-3knd</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post is part of the Ultimate Container Security Series, a structured, multi-part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;see the series introduction post here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before talking about tools, configurations, or best practices, it is important to understand what we are actually trying to protect and from whom. This is where a threat model comes in. A threat model provides a structured way to reason about security risks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Threat Model?
&lt;/h2&gt;

&lt;p&gt;To understand threat modeling, it helps to clearly distinguish a few related concepts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;risk&lt;/strong&gt; is a potential problem and the impact it would have if it occurred.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;threat&lt;/strong&gt; is a possible path that could lead to that risk becoming real.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;mitigation&lt;/strong&gt; is a countermeasure that reduces the likelihood of a threat succeeding or limits its impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A simple example can help illustrate this:&lt;/em&gt; Imagine you work from home and rely on a laptop that contains sensitive work data. The risk is that this data could be stolen. The threats are the different ways this might happen: someone breaking into your house, stealing the laptop from your car, or tricking you into installing malware. Mitigations could include locking your doors, encrypting the disk, or using strong authentication.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The key point is that a single risk can have many different threats, and each threat may require different mitigations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Threat Models Differ
&lt;/h2&gt;

&lt;p&gt;Risks vary significantly depending on the context.&lt;/p&gt;

&lt;p&gt;A bank holding customer funds will focus heavily on preventing financial theft. An e-commerce platform may prioritize fraud and availability. A personal blog might be most concerned with account takeover or defacement.&lt;/p&gt;

&lt;p&gt;Regulatory environments also affect risk. For example, leaking personal data may be primarily a reputational issue in some regions, while in others, such as the European Union, regulations like GDPR can result in substantial financial penalties.&lt;/p&gt;

&lt;p&gt;Because risks differ, the importance of specific threats and the appropriate mitigations will also differ. This is why threat modeling is not about finding a single “correct” list of threats, but about systematically identifying and prioritizing the threats that matter in a given environment.&lt;/p&gt;

&lt;p&gt;Threat modeling is the process of identifying and enumerating potential threats to a system by examining its components, interfaces, and modes of operation. Done well, it highlights where a system is most exposed and where security efforts will have the greatest effect.&lt;/p&gt;

&lt;p&gt;The goal of this chapter is to establish a shared mental model that will be used throughout the rest of the series. We will look at different ways container threats are commonly structured and explain which approach this series will follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Ways to Structure a Container Threat Model
&lt;/h2&gt;

&lt;p&gt;There is no single comprehensive threat model that fits all environments. However, several well-established approaches are commonly used in container security. Each emphasizes a different perspective.&lt;/p&gt;

&lt;h3&gt;
  
  
  Component-Based (Data-Centric) Threat Model
&lt;/h3&gt;

&lt;p&gt;One common approach is to &lt;strong&gt;model threats around the core components of a containerized environment&lt;/strong&gt;. This is the approach taken by &lt;a href="https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf" rel="noopener noreferrer"&gt;NIST Special Publication 800-190: Application Container Security Guide&lt;/a&gt;, which identifies major risks associated with the following components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image risks&lt;/li&gt;
&lt;li&gt;Registry risks&lt;/li&gt;
&lt;li&gt;Orchestrator risks&lt;/li&gt;
&lt;li&gt;Container risks &lt;/li&gt;
&lt;li&gt;Host OS risks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z0vkrhsypkl30iu4t1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z0vkrhsypkl30iu4t1a.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The NIST Special Publication 800-190 was written in 2017 and, unfortunately, has not been updated. Due to this, it does not touch on some of the newer threats and technologies that have emerged since then.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This type of threat model is often called component-based, because each component represents a distinct surface an attacker might target.&lt;/p&gt;

&lt;p&gt;By examining each component independently, this model helps architects and operators understand where controls must exist and how failures in one area can affect the rest of the system.&lt;/p&gt;

&lt;p&gt;NIST SP 800-190 uses this component-based structure to remain vendor-neutral and applicable across different container platforms.&lt;/p&gt;

&lt;p&gt;In this series, we use the same underlying risks identified by NIST, but reorganize them along the container lifecycle to make them easier to learn and apply in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attacker-Centric Threat Model
&lt;/h3&gt;

&lt;p&gt;Another way to structure a threat model is to focus on who the attacker is, rather than where they attack.&lt;/p&gt;

&lt;p&gt;In the book Container Security, Liz Rice describes a threat model based on the different actors that may interact with or compromise a containerized system, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;External attackers&lt;/strong&gt; attempting to access a deployment from outside.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal attackers&lt;/strong&gt; who have gained some level of access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Malicious insiders&lt;/strong&gt;, such as developers or administrators with legitimate privileges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inadvertent insiders&lt;/strong&gt; who accidentally introduce security issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application processes&lt;/strong&gt; that may misuse their programmatic access.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzsiudxa3r47d09lixrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzsiudxa3r47d09lixrr.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This attacker-centric approach is particularly useful for understanding intent, privilege levels, and realistic attack paths. It is often used in incident response, threat hunting, and security reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  MITRE ATT&amp;amp;CK for Containers
&lt;/h3&gt;

&lt;p&gt;A more technique-driven approach is provided by &lt;a href="https://attack.mitre.org/matrices/enterprise/containers/" rel="noopener noreferrer"&gt;MITRE ATT&amp;amp;CK for Containers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This framework categorizes adversary behavior into tactics and techniques across the stages of an attack lifecycle, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;initial access,&lt;/li&gt;
&lt;li&gt;execution,&lt;/li&gt;
&lt;li&gt;persistence,&lt;/li&gt;
&lt;li&gt;privilege escalation,&lt;/li&gt;
&lt;li&gt;defense evasion,&lt;/li&gt;
&lt;li&gt;credential access,&lt;/li&gt;
&lt;li&gt;discovery,&lt;/li&gt;
&lt;li&gt;lateral movement,&lt;/li&gt;
&lt;li&gt;and impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3rhikm5rkpqhwjorlch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3rhikm5rkpqhwjorlch.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image source: &lt;a href="https://attack.mitre.org/matrices/enterprise/containers/" rel="noopener noreferrer"&gt;MITRE ATT&amp;amp;CK for Containers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MITRE ATT&amp;amp;CK is especially useful for detection and response, as it helps security teams understand how attacks progress over time and which behaviors to monitor at runtime. While powerful, it is often too detailed to serve as an introductory threat model on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lifecycle-Based Threat Model
&lt;/h3&gt;

&lt;p&gt;The final approach is to &lt;strong&gt;structure threats along the container lifecycle&lt;/strong&gt;. This model focuses on when threats occur rather than where or who is involved. It aligns closely with how containerized systems are built and operated in practice.&lt;/p&gt;

&lt;p&gt;In this series, we use the following lifecycle stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; - PART 2: Secure Container Image Building&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distribution&lt;/strong&gt; - PART 3: Registries &amp;amp; Supply Chain Security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment&lt;/strong&gt; - PART 4: Host &amp;amp; Container Platform Security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime&lt;/strong&gt; - PART 5: Container Runtime Security&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12sek2hsfk12imciz3qj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12sek2hsfk12imciz3qj.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach allows us to reason about threats in the same order containers move through the system, while still incorporating insights from component-based and attacker-centric models.&lt;/p&gt;

&lt;p&gt;There is no single threat model that fits every environment, but many of the threats discussed in this series are common to most container deployments, regardless of scale or platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Identifying Attack Vectors
&lt;/h2&gt;

&lt;p&gt;Once a threat model is defined, the next step is to identify &lt;strong&gt;attack vectors&lt;/strong&gt;, the concrete entry points an attacker may use to exploit the system.&lt;/p&gt;

&lt;p&gt;In containerized environments, attack vectors can appear at &lt;strong&gt;every stage of the container lifecycle&lt;/strong&gt; and across &lt;strong&gt;multiple components&lt;/strong&gt;. Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vulnerable application code running inside containers&lt;/li&gt;
&lt;li&gt;Insecure container image build configurations&lt;/li&gt;
&lt;li&gt;Compromised or untrusted container image supply chains&lt;/li&gt;
&lt;li&gt;Insecure image storage and retrieval mechanisms&lt;/li&gt;
&lt;li&gt;Weak host machine and kernel security&lt;/li&gt;
&lt;li&gt;Exposed or over-privileged credentials and tokens&lt;/li&gt;
&lt;li&gt;Flat or poorly segmented container networking&lt;/li&gt;
&lt;li&gt;Container escape vulnerabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  General Security Principles
&lt;/h2&gt;

&lt;p&gt;Regardless of the specific threat model or deployment architecture, certain &lt;strong&gt;security principles consistently reduce risk&lt;/strong&gt; in containerized environments.&lt;/p&gt;

&lt;p&gt;These principles do not replace a threat model. Instead, they &lt;strong&gt;guide how mitigations are selected and applied&lt;/strong&gt;, and they will be revisited throughout the rest of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular audits and timely updates&lt;/li&gt;
&lt;li&gt;Applying the principle of least privilege&lt;/li&gt;
&lt;li&gt;Network segmentation and isolation&lt;/li&gt;
&lt;li&gt;Runtime visibility and enforcement&lt;/li&gt;
&lt;li&gt;Continuous image scanning&lt;/li&gt;
&lt;li&gt;Defense in depth rather than single controls&lt;/li&gt;
&lt;li&gt;Reducing the exposed attack surface&lt;/li&gt;
&lt;li&gt;Limiting the blast radius of a compromise&lt;/li&gt;
&lt;li&gt;Clear segregation of duties between roles and systems&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the &lt;a href="https://dev.to/0xalphasecurity/ultimate-container-security-series-2628"&gt;series introduction post provides the complete roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>cybersecurity</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Ultimate Container Security Series</title>
      <dc:creator>0xAlphaSecurity</dc:creator>
      <pubDate>Sun, 04 Jan 2026 17:30:57 +0000</pubDate>
      <link>https://forem.com/0xalphasecurity/ultimate-container-security-series-2628</link>
      <guid>https://forem.com/0xalphasecurity/ultimate-container-security-series-2628</guid>
      <description>&lt;p&gt;Welcome to the &lt;strong&gt;Ultimate Container Security Series&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Over the past few years, I’ve been working extensively with containers and their security aspects. I’ve read many great books, blogs, and tutorials, and I’ve also run containerized workloads in production environments. During this time, I felt the need for a well-organized, practical series that covers the most important container security topics in one place.&lt;/p&gt;

&lt;p&gt;This series is my attempt to bring together the &lt;strong&gt;key concepts, real-world scenarios, and practical recipes&lt;/strong&gt; needed to understand and apply container security effectively. The goal is to help readers &lt;strong&gt;learn the topics faster&lt;/strong&gt;, with examples that can be easily applied in real production environments. Whenever possible, I’ll include &lt;strong&gt;working examples&lt;/strong&gt; to make the concepts easier to understand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kfim0xmqrdkl8sgjng6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kfim0xmqrdkl8sgjng6.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Series Structure
&lt;/h1&gt;

&lt;p&gt;The series will be divided into five main parts, and each part will consist of multiple chapters. Each chapter will be published as a separate blog post.&lt;/p&gt;

&lt;p&gt;You can use this post as the main reference to see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what has already been written,&lt;/li&gt;
&lt;li&gt;when a chapter was last updated,&lt;/li&gt;
&lt;li&gt;and which topics are coming next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I recommend bookmarking this page, as it will also be used for future announcements and updates related to the series.&lt;/p&gt;

&lt;h1&gt;
  
  
  Main Outline
&lt;/h1&gt;

&lt;h3&gt;
  
  
  PART 1: Foundations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/0xalphasecurity/chapter-1-container-security-threat-model-3knd"&gt;Chapter 1: Container Security Threat Model&lt;/a&gt; (updated: 4.1.2026)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/0xalphasecurity/chapter-2-linux-system-calls-1eb"&gt;Chapter 2: Linux System Calls&lt;/a&gt; (updated: 7.1.2026)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/0xalphasecurity/chapter-3-linux-file-permissions-34ac"&gt;Chapter 3: Linux File Permissions&lt;/a&gt; (updated: 5.2.2026)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/0xalphasecurity/chapter-4-linux-capabilities-5452"&gt;Chapter 4: Linux Capabilities&lt;/a&gt; (updated: 6.3.2026)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/0xalphasecurity/chapter-5-linux-control-groups-cgroups-2mbm"&gt;Chapter 5: Linux Control Groups (cgroups)&lt;/a&gt; (updated: 22.3.2026)&lt;/li&gt;
&lt;li&gt;Chapter 6: Linux Namespaces (writing in progress)&lt;/li&gt;
&lt;li&gt;Chapter 7: Understanding Container Isolation (writing in progress)&lt;/li&gt;
&lt;li&gt;Chapter 8: Container Related Vulnerabilities and Attacks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  PART 2: Secure Container Image Building
&lt;/h3&gt;

&lt;h3&gt;
  
  
  PART 3: Registries &amp;amp; Supply Chain Security
&lt;/h3&gt;

&lt;h3&gt;
  
  
  PART 4: Host &amp;amp; Container Platform Security
&lt;/h3&gt;

&lt;h3&gt;
  
  
  PART 5: Container Runtime Security
&lt;/h3&gt;

&lt;h1&gt;
  
  
  Goals of the Series
&lt;/h1&gt;

&lt;p&gt;The goal of this series is to provide an up-to-date overview of the most important container security topics, supported by real examples and best-practice solutions.&lt;/p&gt;

&lt;p&gt;Container technologies evolve very quickly, so this series is not static. Chapters may be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;updated,&lt;/li&gt;
&lt;li&gt;expanded,&lt;/li&gt;
&lt;li&gt;reorganized,&lt;/li&gt;
&lt;li&gt;or extended with new topics over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The dates listed next to each topic in this post will serve as a reference point to indicate when a resource was last updated.&lt;/p&gt;

&lt;h1&gt;
  
  
  Release Plan
&lt;/h1&gt;

&lt;p&gt;Writing a full course takes time. My goal is to publish &lt;strong&gt;most of the planned topics by July 2026&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I plan to update the series &lt;strong&gt;weekly&lt;/strong&gt;, and in some cases even &lt;strong&gt;daily&lt;/strong&gt;, depending on the topic and complexity.&lt;/p&gt;

&lt;p&gt;The full course content will also be &lt;strong&gt;available on GitHub&lt;/strong&gt;, including examples and supporting materials.&lt;/p&gt;

&lt;p&gt;If there is a &lt;strong&gt;specific container security topic you are interested in, feel free to leave it in the comments&lt;/strong&gt;. I’ll do my best to cover it as part of this series.&lt;/p&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>cybersecurity</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
