DEV Community

omkar shelke
omkar shelke

Posted on

2 1 1 1 1

SecurityContext in Kubernetes

1. Introduction to SecurityContext in Kubernetes

A SecurityContext in Kubernetes defines privilege and access control settings for pods or containers, allowing you to control how processes run, access resources, and interact with the system. It is a critical component for securing Kubernetes workloads by enforcing least-privilege principles.

  • Pod-Level SecurityContext: Applies security settings to all containers in a pod and can affect the pod’s volumes. It’s defined under spec.securityContext.
  • Container-Level SecurityContext: Applies to a specific container and can override pod-level settings for that container. It’s defined under spec.containers[].securityContext.

The key difference is scope:

  • Pod-level settings provide a baseline for all containers and volumes in the pod.
  • Container-level settings allow fine-grained customization for individual containers, overriding pod-level settings where applicable.

2. Pod-Level SecurityContext

The pod-level securityContext is defined in the pod’s spec and applies to all containers in the pod unless overridden by a container-level securityContext. It also applies to certain volume-related settings (e.g., fsGroup and seLinuxOptions).

Fields in Pod-Level SecurityContext

Here’s a comprehensive list of fields available at the pod level, their purpose, and examples:

  1. runAsUser:

    • Purpose: Specifies the user ID (UID) for all containers’ processes in the pod.
    • Use Case: Ensures containers don’t run as root, reducing the risk of privilege escalation.
    • Example: A web server pod where all containers should run as a non-root user for security.
     apiVersion: v1
     kind: Pod
     metadata:
       name: web-server-pod
     spec:
       securityContext:
         runAsUser: 1000  # All containers run as UID 1000
       containers:
       - name: nginx
         image: nginx
         ports:
         - containerPort: 80
    

    In this example, a web server (e.g., Nginx) runs as UID 1000, preventing root-level access even if the container is compromised.

  2. runAsGroup:

    • Purpose: Sets the primary group ID (GID) for all containers’ processes.
    • Use Case: Controls group ownership for files created by containers, useful for shared volumes.
    • Example: A pod with a shared volume where files need consistent group ownership.
     apiVersion: v1
     kind: Pod
     metadata:
       name: shared-volume-pod
     spec:
       securityContext:
         runAsUser: 1000
         runAsGroup: 3000  # Primary group ID for processes
       volumes:
       - name: shared-data
         emptyDir: {}
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "echo hello > /data/testfile && sleep 1h"]
         volumeMounts:
         - name: shared-data
           mountPath: /data
    

    Files created in the /data volume will be owned by GID 3000, ensuring consistent group access.

  3. runAsNonRoot:

    • Purpose: Ensures all containers run as a non-root user (UID ≠ 0). If set to true, Kubernetes rejects the pod if any container tries to run as root.
    • Use Case: Enforce a policy where no container in the pod can run as root.
    • Example: A corporate policy requires all pods to run non-root for compliance.
     apiVersion: v1
     kind: Pod
     metadata:
       name: non-root-pod
     spec:
       securityContext:
         runAsNonRoot: true  # Enforces non-root user
       containers:
       - name: app
         image: nginx
         ports:
         - containerPort: 80
    

    If the container tries to run as root, the pod will fail to start.

  4. fsGroup:

    • Purpose: Sets the group ID for volume ownership and permissions. Kubernetes applies this GID to volumes that support ownership management (e.g., emptyDir, persistentVolumeClaim).
    • Use Case: Ensures files in a shared volume are accessible by a specific group, such as in a multi-container pod.
    • Example: A pod with a shared volume for a data processing application.
     apiVersion: v1
     kind: Pod
     metadata:
       name: data-processing-pod
     spec:
       securityContext:
         runAsUser: 1000
         fsGroup: 2000  # Volume files owned by GID 2000
       volumes:
       - name: data-vol
         emptyDir: {}
       containers:
       - name: processor
         image: busybox
         command: ["sh", "-c", "echo data > /data/output && sleep 1h"]
         volumeMounts:
         - name: data-vol
           mountPath: /data
    

    Files in /data will be owned by GID 2000, ensuring group-level access control.

  5. supplementalGroups:

    • Purpose: Adds additional group IDs to container processes, beyond the primary runAsGroup.
    • Use Case: Grants access to resources owned by multiple groups, such as shared storage.
    • Example: A pod accessing multiple shared volumes with different group ownerships.
     apiVersion: v1
     kind: Pod
     metadata:
       name: multi-group-pod
     spec:
       securityContext:
         runAsUser: 1000
         runAsGroup: 3000
         supplementalGroups: [4000, 5000]  # Additional group memberships
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
    

    Processes in the container belong to GIDs 3000, 4000, and 5000, allowing access to resources owned by these groups.

  6. supplementalGroupsPolicy (Kubernetes v1.33+, beta):

    • Purpose: Controls how supplementary groups are calculated. Options are:
      • Merge: Merges groups from the container image’s /etc/group with fsGroup and supplementalGroups.
      • Strict: Only uses groups specified in fsGroup, supplementalGroups, or runAsGroup, ignoring /etc/group.
    • Use Case: Avoid unintended group memberships from the container image for stricter security.
    • Example: A pod requiring strict group control for compliance.
     apiVersion: v1
     kind: Pod
     metadata:
       name: strict-groups-pod
     spec:
       securityContext:
         runAsUser: 1000
         runAsGroup: 3000
         supplementalGroups: [4000]
         supplementalGroupsPolicy: Strict  # Only specified groups are used
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
    

    The container process will only have GIDs 3000 and 4000, ignoring any groups defined in the image’s /etc/group.

  7. fsGroupChangePolicy:

    • Purpose: Controls how Kubernetes changes ownership and permissions for volumes. Options are:
      • OnRootMismatch: Only changes permissions if the volume’s root directory doesn’t match the expected fsGroup.
      • Always: Always changes permissions when the volume is mounted.
    • Use Case: Optimize pod startup time for large volumes by reducing unnecessary permission changes.
    • Example: A pod with a large persistent volume.
     apiVersion: v1
     kind: Pod
     metadata:
       name: large-volume-pod
     spec:
       securityContext:
         runAsUser: 1000
         fsGroup: 2000
         fsGroupChangePolicy: OnRootMismatch  # Optimize permission changes
       volumes:
       - name: data
         persistentVolumeClaim:
           claimName: data-pvc
       containers:
       - name: app
         image: busybox
         volumeMounts:
         - name: data
           mountPath: /data
    

    This reduces startup time by only changing permissions when necessary.

  8. seLinuxOptions:

    • Purpose: Assigns SELinux labels to containers and volumes for access control.
    • Use Case: Enforce mandatory access control in environments with SELinux enabled (e.g., Red Hat systems).
    • Example: A pod running in an SELinux-enabled cluster.
     apiVersion: v1
     kind: Pod
     metadata:
       name: selinux-pod
     spec:
       securityContext:
         seLinuxOptions:
           level: "s0:c123,c456"  # SELinux label for processes and volumes
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
    

    All containers and volumes use the specified SELinux label, ensuring compliance with SELinux policies.

  9. seLinuxChangePolicy (Kubernetes v1.33+, beta):

    • Purpose: Controls SELinux relabeling behavior. Options are:
      • MountOption: Uses mount options for faster relabeling (requires SELinuxMount feature gate).
      • Recursive: Recursively relabels all files in the volume.
    • Use Case: Optimize SELinux relabeling for performance or allow multiple pods with different labels to share a volume.
    • Example: A pod opting out of mount-based relabeling for compatibility.
     apiVersion: v1
     kind: Pod
     metadata:
       name: selinux-recursive-pod
     spec:
       securityContext:
         seLinuxOptions:
           level: "s0:c123,c456"
         seLinuxChangePolicy: Recursive  # Recursive relabeling
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
    

    This ensures recursive relabeling, allowing multiple pods with different SELinux labels to share a volume.

  10. procMount (Kubernetes v1.33+, beta):

    • Purpose: Controls the /proc filesystem’s mount behavior. Options are:
      • Default: Masks certain /proc paths (e.g., /proc/kcore) and makes others read-only.
      • Unmasked: Exposes all /proc paths, useful for nested container runtimes.
    • Use Case: Running containers within containers (e.g., Docker-in-Docker).
    • Example: A pod running a CI/CD pipeline with nested containers.
      apiVersion: v1
      kind: Pod
      metadata:
        name: dind-pod
      spec:
        securityContext:
          procMount: Unmasked  # Expose full /proc
        hostUsers: false  # Required for Unmasked
        containers:
        - name: docker
          image: docker:dind
          command: ["dockerd"]
    

    This allows the Docker daemon to access the full /proc filesystem for container management.

Real-Life Example for Pod-Level SecurityContext

Scenario: A company runs a microservices application with multiple pods, each containing multiple containers (e.g., an app and a logging sidecar). To comply with security policies, all containers must run as non-root, and shared volumes must be accessible by a specific group.

apiVersion: v1
kind: Pod
metadata:
  name: microservice-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    runAsNonRoot: true
  volumes:
  - name: logs
    emptyDir: {}
  containers:
  - name: app
    image: my-app:1.0
    volumeMounts:
    - name: logs
      mountPath: /logs
  - name: log-collector
    image: fluentd
    volumeMounts:
    - name: logs
      mountPath: /logs
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • All containers run as UID 1000 and GID 3000.
  • The logs volume is owned by GID 2000 (fsGroup), ensuring both containers can write to it.
  • runAsNonRoot: true enforces non-root execution, aligning with compliance requirements.

3. Container-Level SecurityContext

The container-level securityContext is defined under spec.containers[].securityContext and applies only to the specific container. It can override pod-level settings for that container but doesn’t affect volumes.

Fields in Container-Level SecurityContext

Here’s a comprehensive list of fields available at the container level:

  1. runAsUser:

    • Purpose: Overrides the pod-level runAsUser for the specific container.
    • Use Case: A specific container needs to run as a different user (e.g., root for administrative tasks).
    • Example: A pod with a sidecar requiring root privileges.
     apiVersion: v1
     kind: Pod
     metadata:
       name: mixed-user-pod
     spec:
       securityContext:
         runAsUser: 1000
       containers:
       - name: app
         image: nginx
         ports:
         - containerPort: 80
       - name: admin-tool
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           runAsUser: 0  # Runs as root, overriding pod-level setting
    
  2. runAsGroup:

    • Purpose: Overrides the pod-level runAsGroup for the container’s primary group ID.
    • Use Case: A container needs a different primary group for specific access requirements.
    • Example: A container accessing a volume with a unique group.
     apiVersion: v1
     kind: Pod
     metadata:
       name: custom-group-pod
     spec:
       securityContext:
         runAsGroup: 3000
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           runAsGroup: 4000  # Overrides pod-level runAsGroup
    
  3. runAsNonRoot:

    • Purpose: Enforces non-root execution for the specific container, overriding pod-level settings.
    • Use Case: Ensure a specific container adheres to non-root policies, even if the pod allows root.
    • Example: A sidecar container must run non-root for security.
     apiVersion: v1
     kind: Pod
     metadata:
       name: non-root-sidecar-pod
     spec:
       containers:
       - name: app
         image: nginx
       - name: sidecar
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           runAsNonRoot: true
    
  4. capabilities:

    • Purpose: Adds or drops Linux capabilities for the container.
    • Use Case: Grant specific privileges (e.g., NET_ADMIN) without full root access.
    • Example: A container needs to manage network interfaces.
     apiVersion: v1
     kind: Pod
     metadata:
       name: network-admin-pod
     spec:
       containers:
       - name: network-tool
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           capabilities:
             add: ["NET_ADMIN"]  # Grants network administration privileges
             drop: ["ALL"]  # Drops all other capabilities
    
  5. privileged:

    • Purpose: Runs the container in privileged mode, granting full root privileges, similar to Docker’s --privileged flag.
    • Use Case: Rare cases where a container needs unrestricted access (e.g., running a system utility).
    • Example: A container running a system diagnostic tool.
     apiVersion: v1
     kind: Pod
     metadata:
       name: privileged-pod
     spec:
       containers:
       - name: diagnostic-tool
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           privileged: true  # Full root privileges
    
  6. allowPrivilegeEscalation:

    • Purpose: Controls whether a process can gain more privileges than its parent (e.g., via setuid binaries). Set to false to prevent escalation.
    • Use Case: Prevent containers from escalating privileges in sensitive environments.
    • Example: A container running untrusted code.
     apiVersion: v1
     kind: Pod
     metadata:
       name: no-escalation-pod
     spec:
       containers:
       - name: app
         image: busybox
         command: ["sh", "-c", "sleep 1h"]
         securityContext:
           allowPrivilegeEscalation: false  # Prevents privilege escalation
    
  7. readOnlyRootFilesystem:

    • Purpose: Mounts the container’s root filesystem as read-only, preventing modifications.
    • Use Case: Enhance security by ensuring the container cannot alter its filesystem.
    • Example: A stateless application container.
     apiVersion: v1
     kind: Pod
     metadata:
       name: readonly-pod
     spec:
       containers:
       - name: app
         image: nginx
         securityContext:
           readOnlyRootFilesystem: true  # Root filesystem is read-only
    
  8. seccompProfile:

    • Purpose: Specifies a Seccomp profile to filter system calls, enhancing security.
    • Options:
      • RuntimeDefault: Uses the container runtime’s default profile.
      • Unconfined: No Seccomp filtering.
      • Localhost: Uses a custom profile from the node.
    • Use Case: Restrict dangerous system calls in a container.
    • Example: A container with a default Seccomp profile.
     apiVersion: v1
     kind: Pod
     metadata:
       name: seccomp-pod
     spec:
       containers:
       - name: app
         image: busybox
         securityContext:
           seccompProfile:
             type: RuntimeDefault  # Apply default Seccomp profile
    
  9. appArmorProfile:

    • Purpose: Applies an AppArmor profile to restrict the container’s capabilities.
    • Options: RuntimeDefault, Unconfined, or Localhost with a profile name.
    • Use Case: Restrict a container’s access in an AppArmor-enabled environment.
    • Example: A container with a custom AppArmor profile.
     apiVersion: v1
     kind: Pod
     metadata:
       name: apparmor-pod
     spec:
       containers:
       - name: app
         image: busybox
         securityContext:
           appArmorProfile:
             type: Localhost
             localhostProfile: k8s-apparmor-example-deny-write
    
  10. seLinuxOptions:

    • Purpose: Overrides pod-level SELinux labels for the container.
    • Use Case: Apply a specific SELinux label to a container in an SELinux-enabled cluster.
    • Example: A container requiring a unique SELinux label.
      apiVersion: v1
      kind: Pod
      metadata:
        name: selinux-container-pod
      spec:
        containers:
        - name: app
          image: busybox
          securityContext:
            seLinuxOptions:
              level: "s0:c789,c012"
    
  11. procMount:

    • Purpose: Overrides pod-level procMount settings for the container.
    • Use Case: A specific container needs an unmasked /proc for nested container runtimes.
    • Example: A container running a nested Kubernetes cluster.
      apiVersion: v1
      kind: Pod
      metadata:
        name: nested-k8s-pod
      spec:
        containers:
        - name: k8s
          image: kindest/node
          securityContext:
            procMount: Unmasked  # Full /proc access
    

Real-Life Example for Container-Level SecurityContext

Scenario: A pod runs a web application (Nginx) and a monitoring tool requiring specific privileges (e.g., NET_ADMIN for network diagnostics).

apiVersion: v1
kind: Pod
metadata:
  name: web-monitor-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsNonRoot: true
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  - name: monitor
    image: busybox
    command: ["sh", "-c", "sleep 1h"]
    securityContext:
      runAsUser: 2000  # Override pod-level runAsUser
      capabilities:
        add: ["NET_ADMIN"]  # Grant network privileges
      allowPrivilegeEscalation: false  # Prevent escalation
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • The pod-level runAsUser: 1000 applies to the Nginx container.
  • The monitor container overrides this with runAsUser: 2000 and adds NET_ADMIN for diagnostics.
  • allowPrivilegeEscalation: false ensures the monitor cannot gain additional privileges.

4. Privileged Mode

Privileged mode (privileged: true) grants a container full root privileges, equivalent to Docker’s --privileged flag. It bypasses most security restrictions, giving the container access to the host’s resources.

When to Use Privileged Mode

  • Use Case: Rare scenarios requiring unrestricted access, such as:
    • Running system utilities (e.g., kernel debugging tools).
    • Nested container runtimes (e.g., Docker-in-Docker).
    • Hardware access (e.g., GPU drivers).
  • Risks: Highly insecure, as it allows the container to affect the host system. Avoid unless absolutely necessary.

Example of Privileged Mode

Scenario: A pod running a Docker-in-Docker (DinD) setup for a CI/CD pipeline.

apiVersion: v1
kind: Pod
metadata:
  name: dind-pod
spec:
  containers:
  - name: docker
    image: docker:dind
    securityContext:
      privileged: true  # Full root privileges
    command: ["dockerd"]
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • The docker:dind image requires privileged mode to run the Docker daemon, which needs access to the host’s kernel and devices.
  • This setup is common in CI/CD pipelines (e.g., Jenkins) but should be tightly controlled due to security risks.

5. Pod-Level vs. Container-Level SecurityContext: Differences

Aspect Pod-Level SecurityContext Container-Level SecurityContext
Scope Applies to all containers in the pod and volumes. Applies only to the specific container.
Fields Available Includes fsGroup, supplementalGroups, seLinuxOptions, fsGroupChangePolicy, supplementalGroupsPolicy, procMount. Includes capabilities, privileged, readOnlyRootFilesystem, seccompProfile, appArmorProfile, and overrides for runAsUser, runAsGroup, runAsNonRoot, seLinuxOptions, procMount.
Volume Impact Affects volume ownership and permissions (fsGroup, seLinuxOptions). Does not affect volumes.
Override Behavior Provides default settings for all containers. Overrides pod-level settings for the container.
Use Case Set baseline security for all containers and volumes (e.g., shared volume permissions). Customize security for a specific container (e.g., add capabilities or run as root).

Example of Pod vs. Container-Level Interaction:

apiVersion: v1
kind: Pod
metadata:
  name: mixed-security-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: app
    image: nginx
  - name: privileged-tool
    image: busybox
    securityContext:
      runAsUser: 0  # Override to run as root
      privileged: true  # Full privileges
      capabilities:
        add: ["SYS_ADMIN"]
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • The app container uses the pod-level settings (runAsUser: 1000, runAsGroup: 3000).
  • The privileged-tool container overrides these with runAsUser: 0 and runs in privileged mode with additional capabilities.
  • The fsGroup: 2000 applies to any shared volumes, unaffected by container-level settings.

6. When to Use Pod-Level vs. Container-Level SecurityContext

  • Use Pod-Level SecurityContext:

    • When all containers in the pod share common security settings (e.g., non-root execution, volume ownership).
    • For volume-related settings (fsGroup, seLinuxOptions) that apply across containers.
    • Example: A pod with multiple containers sharing a volume, requiring consistent user and group settings.
  • Use Container-Level SecurityContext:

    • When a specific container needs different settings (e.g., one container needs NET_ADMIN or root privileges).
    • For container-specific restrictions like readOnlyRootFilesystem or seccompProfile.
    • Example: A pod where one container runs a privileged task while others are restricted.

7. Best Practices and Real-Life Considerations

  1. Minimize Privileges:

    • Avoid privileged: true unless absolutely necessary.
    • Use runAsNonRoot: true and drop unnecessary capabilities.
  2. Use Read-Only Filesystems:

    • Set readOnlyRootFilesystem: true for containers that don’t need to write to their filesystem.
  3. Optimize Volume Permissions:

    • Use fsGroupChangePolicy: OnRootMismatch for large volumes to reduce startup time.
    • Use supplementalGroupsPolicy: Strict to avoid unintended group memberships.
  4. Leverage Seccomp and AppArmor:

    • Apply seccompProfile: RuntimeDefault and AppArmor profiles for additional security layers.
  5. SELinux in Secure Environments:

    • Use seLinuxOptions and seLinuxChangePolicy: Recursive in SELinux-enabled clusters for fine-grained control.
  6. Monitor and Audit:

    • Use tools like kubectl describe pod and metrics (e.g., selinux_warning_controller_selinux_volume_conflict) to detect misconfigurations.

8. Conclusion

Pod-level SecurityContext is ideal for setting baseline security policies and managing volume permissions across all containers in a pod. Container-level SecurityContext allows fine-grained customization for individual containers, overriding pod-level settings when needed. Privileged mode should be used sparingly due to its security risks.

Heroku

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (1)

Collapse
 
youngfra profile image
Fraser Young

This is a detailed breakdown of SecurityContext in Kubernetes. Are there any typical scenarios where combining both pod-level and container-level security contexts becomes necessary, or is it generally better to stick to one?

AWS Security LIVE! Stream

Streaming live from AWS re:Inforce

Join AWS Security LIVE! at re:Inforce for real conversations with AWS Partners.

Learn More