Forem: Ivan Zykov

I Squeezed an Entire MLOps Pipeline into 10 Lines of YAML

Ivan Zykov — Tue, 31 Mar 2026 11:14:49 +0000

Every ML project I worked on in GitLab had the same problem: a bloated .gitlab-ci.yml with hand-rolled MLflow integration, custom validation scripts, and manual model registration. Copy it to the next project, tweak the paths, fix the bugs you already fixed last time. By the fifth project you don't remember which config has the working version of MLFLOW_RUN_ID passthrough between jobs.

So I built a GitLab CI/CD component that replaces all of that with 10 lines of YAML.

Before vs After

Here's what a typical MLOps pipeline looked like before — and this is the shortened version:

stages: [validate, train, evaluate, register]

validate-data:
  stage: validate
  image: python:3.12
  script:
    - pip install pandas great_expectations
    - python scripts/validate.py --data data/train.csv --check-nulls --threshold 0.05
  artifacts:
    paths: [validation_report.json]

train-model:
  stage: train
  image: python:3.12
  variables:
    MLFLOW_TRACKING_URI: "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/ml/mlflow"
  script:
    - pip install mlflow scikit-learn pandas
    - python scripts/train.py --data data/train.csv
    - echo "MLFLOW_RUN_ID=$(cat run_id.txt)" >> train.env
  artifacts:
    reports:
      dotenv: train.env
    paths: [model/, metrics.json]

evaluate-model:
  stage: evaluate
  image: python:3.12
  needs: [{job: train-model, artifacts: true}]
  variables:
    MLFLOW_TRACKING_URI: "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/ml/mlflow"
  script:
    - pip install mlflow
    - python scripts/evaluate.py --run-id $MLFLOW_RUN_ID --threshold 0.85
    - echo "EVAL_PASSED=$(cat eval_result.txt)" >> evaluate.env
  artifacts:
    reports:
      dotenv: evaluate.env

register-model:
  stage: register
  image: python:3.12
  needs: [{job: train-model, artifacts: true}, {job: evaluate-model, artifacts: true}]
  rules:
    - if: $EVAL_PASSED == "true"
  variables:
    MLFLOW_TRACKING_URI: "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/ml/mlflow"
  script:
    - pip install mlflow
    - python scripts/register.py --run-id $MLFLOW_RUN_ID --model-name my-model

And this doesn't even include DVC, pip caching, retry logic, or error handling. Each project also had its own validate.py, evaluate.py, register.py — each with its own implementation of auto_configure_mlflow, its own argument parsing, its own bugs.

Now the same thing:

stages: [validate, train, evaluate, register]

include:
  - component: gitlab.com/netOpyr/gitlab-mlops-component/full-pipeline@1.0.0
    inputs:
      model_name: wine-classifier
      training_script: scripts/train.py
      training_args: '--data data/train.csv --test-data data/test.csv'
      data_path: data/train.csv
      framework: sklearn
      metric_name: accuracy
      min_threshold: '0.85'

These 10 lines give you 4 jobs:

validate --> train --> evaluate --> register
   |           |          |            |
 schema      MLflow    accuracy    Model Registry
 nulls       autolog   >= 0.85    (if eval passed)
 drift       metrics   vs prod

All the boilerplate scripts now live inside the component. You only write the training script.

What Each Stage Does

validate checks your data before training starts: schema validation (are all columns present?), null ratio per column (default threshold 5%), and optionally data drift detection via Evidently. Supports Great Expectations suites and custom Python check scripts too.

train wraps your training script in an MLflow session. It auto-configures the tracking URI from GitLab CI variables, creates an experiment and run, enables autolog for your framework (sklearn, PyTorch, TensorFlow, XGBoost, LightGBM), and passes MLFLOW_RUN_ID to your script via environment variable. Your script stays a normal Python file — it works locally and in Jupyter just the same.

evaluate pulls metrics from MLflow and runs them through quality gates. Gate 1: absolute threshold (e.g. accuracy >= 0.85). Gate 2 (optional): comparison with the current production model from Model Registry. Supports higher_is_better: false for loss metrics.

register pushes the model to GitLab Model Registry with metadata: alias (staging by default), commit SHA, pipeline ID, metrics. Works on all GitLab tiers — on Free, alias assignment silently falls back to tags.

DVC Integration

If your data lives in S3/MinIO:

include:
  - component: .../train@1.0.0
    inputs:
      training_script: scripts/train.py
      model_name: my-model
      dvc_enabled: true
      dvc_remote: minio
      dvc_files: 'data/train.csv.dvc data/test.csv.dvc'
      dvc_push: true
      dvc_push_paths: 'model/'

The component installs DVC, pulls data before training, and pushes artifacts back after. Credentials go through CI/CD variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT_URL.

When 10 Lines Aren't Enough

For more complex setups you can include each stage separately:

include:
  - component: .../validate@1.0.0
    inputs:
      data_path: data/train.csv
      enable_drift: true
      reference_data_path: data/reference.csv

  - component: .../train@1.0.0
    inputs:
      training_script: scripts/train.py
      model_name: my-model
      image_suffix: pytorch-gpu
      framework: pytorch
      tags: ["gpu"]

  - component: .../evaluate@1.0.0
    inputs:
      model_name: my-model
      metric_name: val_loss
      min_threshold: '0.1'
      higher_is_better: false

  - component: .../register@1.0.0
    inputs:
      model_name: my-model
      alias: staging

This way you get per-stage GPU runners, custom images, and conditional execution. You can also train multiple models in parallel using the as parameter to give unique names to jobs.

Framework Support

Each framework has a dedicated Docker image selected via image_suffix:

Suffix	Frameworks	GPU
`sklearn`	scikit-learn, matplotlib	No
`boosting`	XGBoost, LightGBM, scikit-learn	No
`pytorch`	PyTorch (CPU)	No
`pytorch-gpu`	PyTorch + CUDA 12.4	Yes
`tensorflow`	TensorFlow (CPU)	No
`tensorflow-gpu`	TensorFlow + CUDA 12.4	Yes

All images include Python 3.12, MLflow, and pandas. Need extra dependencies? Set requirements_file: requirements.txt or bring your own image via image_registry_base.

Try It

Two options:

Fork the example project — a wine classifier with 3 files total. Just create an access token with API scope and add it as MLOPS_ACCESS_TOKEN in CI/CD variables.
Add the component to your existing project. Drop your training script into scripts/ and configure the inputs.

The component is published in the GitLab CI/CD Catalog.

What's Next

Coming soon: BuildKit-based image builds, retry logic for flaky MLflow requests, and GitLab Environments integration.

Found a bug or missing a feature? Open an issue or MR.

Links:

My first container without Docker

Ivan Zykov — Mon, 29 Dec 2025 13:43:06 +0000

Containerization technologies, perhaps like most readers of this article, are stuck in my mind. And it would seem, just write Dockerfile and don't show off. But you always want to learn something new and delve deeper into topics you've already mastered. For this reason, I decided to figure out how containers are implemented in Linux-based systems and then create my own "container" using cmd.

Who maintains containers in Linux?

First, you need to understand what containerization technology is based on. There are two mechanisms in the Linux kernel: namespace and cgroups (control groups). They provide the isolation and scalability that we all love about containers. Let's take a look at both mechanisms in order.

Namespace

Namespaces allow us to isolate system resources between processes. With their help, we can create a separate virtual system while formally remaining in the host system. Perhaps this brief explanation has not enlightened you much, so let's look at an example:
Let's consider a container raised from the alpine image. Let's start it and the interactive shell in it:

docker run -it alpine /bin/sh

Now let's create a new process in the container and check the output of the ps command:

sleep 1000 &
ps -a

Получаем:

PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
   29 root      0:00 sleep 1000
   30 root      0:00 ps -a

Note that the process PID is 29. Now let's try to find the same process, but on the host machine. To do this, we will determine the container ID and use the command to display the processes running inside docker

docker top <container ID>

As a result, we get:

UID     PID       PPID      C    STIME    TTY      TIME        CMD
root    172147    172124    0    Feb05    pts/0    00:00:00    /bin/sh
root    173602    172147    0    Feb05    pts/0    00:00:00    sleep 1000

Let's pay attention to two columns: PID and PPID (parent PID). They indicate the PID of the process itself and its parent, but in the host system. Let's check it out:

ps aux | grep -E '173602|172147'

We get:

root      172147  0.0  0.0   1736   908 pts/0    Ss+  Feb05   0:00 /bin/sh
root      173602  0.0  0.0   1624   980 pts/0    S    Feb05   0:00 sleep 1000

Which is exactly what we needed to prove! To sum up, we can conclude that the container knows nothing about the host machine. It considers itself to be an independent system. However, in reality, all processes are run on the host, they are simply located in the namespace of the container. This creates the illusion of a separate, independent system.
I hope this example has clarified the situation with namespaces a little. In it, we looked at one of the eight types of namespaces. Now I would like to briefly go over each one:

Mount - isolation of file system mount points. Allows you to set your own file system hierarchy;
UTS - host name isolation. Allows each container to specify its own host name;
PID - process ID isolation. Allows you to create a separate process tree;
Network - isolation of network interfaces and routing tables;
IPC - IPC (interprocess communication) isolation.
User - system user isolation. Allows you to create separate users for each container, including root.
Cgroup - cgroup access isolation. Allows you to limit container resources and prevents interference from other containers.
Time - system time isolation

To create a new namespace in Linux, there is a command called unshare. We will take a closer look at it a little later.

Cgroups

Control groups are a Linux kernel mechanism that allows you to manage process resources. With its help, you can limit and isolate the use of CPU, memory, network, and disk resources.
There are two versions of cgoups: v1 and v2. In most modern systems, you will encounter the second version, which is used in systemd. The main difference between the versions is in the construction of the constraint tree. In the first version, nodes were created for each type of constraint, and groups were added to them. In the second version, each group has its own node, which contains all the necessary constraints. To better understand this, let's take a look at the visualization of the v1 and v2 trees:

#v1
/sys/fs/cgroup/
├── cpu
│   ├── group1/
│   │   ├── tasks
│   │   ├── cgroup.procs
│   │   ├── cpu.shares
│   │   └── ...
│   ├── group2/
│   │   ├── tasks
│   │   ├── cgroup.procs
│   │   ├── cpu.shares
│   │   └── ...
│   └── ...
├── memory
│   ├── group1/
│   │   ├── tasks
│   │   ├── cgroup.procs
│   │   ├── memory.limit_in_bytes
│   │   └── ...
│   ├── group2/
│   │   ├── tasks
│   │   ├── cgroup.procs
│   │   ├── memory.limit_in_bytes
│   │   └── ...
│   └── ...
└── ...

#v2
/sys/fs/cgroup/
├── group1/
│   ├── cgroup.procs
│   ├── cpu.max
│   ├── cpu.weight
│   ├── memory.current
│   ├── memory.max
│   └── ...
├── group2/
│   ├── cgroup.procs
│   ├── cpu.max
│   ├── cpu.weight
│   ├── memory.current
│   ├── memory.max
│   └── ...
└── ...

Now let's take a look at how cgroups work using the example of a Docker container. First, let's start the container, limiting its resources (2 cores and 512 MB):

docker run -d --cpus="2" --memory="512m" nginx

Next, we will find a group for this container using the find:

find /sys/fs/cgroup -name '*<container ID>*'

Next, let's check the contents of the cpu.max and memory.max files in the directory we found:

# cpu.max
200000 100000

# memory.max
536870912

Which is what needed to be proven!

Создание контейнера без docker

We have covered the basic theory we need. Now let's move on to practice and resort to the magic of the command line.
First, let's create the container's file system structure and install busybox in the /bin directory:

# Create the root directory of the container and navigate to it.
mkdir ~/container && cd ~/container
# Create the main system directories and navigate to /bin.
mkdir -p ./{proc,sys,dev,tmp,bin,root,etc} && cd bin
# Install busybox.
wget https://www.busybox.net/downloads/binaries/1.35.0-x86_64-linux-musl/busybox
# Grant execution rights
chmod +x busybox
# Create symlinks for all commands available in busybox 
./busybox --list | xargs -I {} ln -s busybox {}
# Return to the root directory of the container
cd ~/container
# Add the PATH variable to the /etc/profile file
echo ‘export PATH=/bin’ > ~/container/etc/profile

We will also add to the /etc/passwd and /etc/group files so that we are root within the isolated system:

echo "root:x:0:0:root:/root:/bin/sh" > ~/container/etc/passwd
echo "root:x:0:" > ~/container/etc/group

Next, we will mount the system directories:

# Mount devices using existing ones
sudo mount --bind /dev ~/container/dev
# Mount processes
sudo mount -t proc none ~/container/proc
# Mount the sysfs file system
sudo mount -t sysfs none ~/container/sys
# Mount the tmpfs file system
sudo mount -t tmpfs none ~/container/tmp

!!!Note: To unmount later, you can use the command:

sudo umount ~/container/{proc,sys,dev,tmp}

We have prepared the file system for our container. Now let's move on to creating isolation. To do this, we will use the command:

unshare -f -p -m -n -i -u -U --map-root-user --mount-proc=./proc \
    /bin/chroot ~/container /bin/sh -c "source /etc/profile && exec /bin/sh"

Let's take a closer look at it:
-f - fork. Create a new process to isolate it from the parent process.

-p - PID namespace;
-m - mount namespace;
-n - Network namespace;
-i - IPC namespace;
-u - UTS namespace;
-U - User namespace;
--map-root-user - map the active user's uid and gid to root inside the container;
-mount-proc - mount proc inside the container;
/bin/chroot ~/container - change the root directory;
/bin/sh -c “source /etc/profile && exec /bin/sh” - start the shell and execute the command that will apply the /etc/profile file and start an interactive shell.

Great! We got our container. Now we need to limit resources. To do this, we will open a new session on the host and perform a series of actions:

# Create a new group. My system uses cgroups v2, so
# the directory will be automatically configured to work with resources.
sudo mkdir /sys/fs/cgroup/my_container
# Write a limit of 2 processor cores
echo “200000 100000” | sudo tee /sys/fs/cgroup/my_container/cpu.max
# Allocate a maximum of 512MB of memory
echo 536870912 | sudo tee /sys/fs/cgroup/my_container/memory.max

Next, we need to determine the PID of the container. To do this, we will use the command:

ps aux | grep -E '/bin/sh$'

Take the PID from the second column and add it to the cgroup.procs file:

echo <PID> | sudo tee /sys/fs/cgroup/my_container/cgroup.procs

That completes the basic configuration. We have created an isolated system and added resource restrictions. But we would like to make it a little more functional, so let's set up a virtual network between the host and the container:

# Create a pair of virtual interfaces
sudo ip link add veth-host type veth peer name veth-container
# Bring up the interface on the host
sudo ip link set veth-host up
# Assign any free address in your network to the host interface
# I am using 192.168.1.123/24
sudo ip addr add 192.168.1.123/24 dev veth-host
# Move veth-container to the container namespace
# Here you need to specify the PID of the container you used before
sudo ip link set veth-container netns <PID>
# Bring up the interface inside the container
sudo nsenter --net=/proc/<PID>/ns/net ip link set veth-container up
# Assign any free address on your network to the container interface
# I am using 192.168.1.124/24
sudo nsenter --net=/proc/<PID>/ns/net ip addr add 192.168.1.124/24 dev veth-container
# Configure the default gateway for traffic routing
sudo nsenter --net=/proc/<PID>/ns/net ip route add default via 192.168.1.123

We have raised all the necessary interfaces. Now we need to configure routing:

# Allow packet forwarding
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
# Add a NAT rule for masquerading outgoing packets from the network 
# 192.168.1.0/24 through the interface that faces the external network. For me, this is enp3s0.
# Masquerading masks packets leaving the container so that they look
# like packets sent from the host
sudo iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o enp3s0 -j MASQUERADE
# Add a rule to allow packet forwarding
sudo iptables -A FORWARD -s 192.168.1.0/24 -o enp3s0 -j ACCEPT
# Add a rule to allow incoming packets
sudo iptables -A FORWARD -d 192.168.1.0/24 -m state --state RELATED,ESTABLISHED -j ACCEPT

Great! We've created our first container. Obviously, there's still a lot that can be configured, such as DNS, which isn't working right now. But that's up to each individual to decide how to deal with it.