DEV Community

Ömer Berat Sezer
Ömer Berat Sezer

Posted on • Edited on

7 3 1 2 1

What is MLOps? 🧐 A Complete Guide to Architecture, MLOps Tools On-Prem, AWS, GCP, Azure

AI becomes more popular in the last 3 years. Different Large language models (GPT, Llama, Gemini, Claude, Nova) are developed by giant companies (OpenAI, Meta, Google, Anthropic, AWS, Mistral). While developing, do you know how they automate, validate, monitor?

Are you curious about what is going on in the back (kitchen)?

Table of Contents

What is MLOps?

MLOps (Machine Learning Operations) is a set of practices that combine Machine Learning (ML), DevOps, and Data Engineering to deploy and maintain ML systems reliably and efficiently in production.

It aims to:

  • Automate the ML lifecycle from data to deployment.
  • Version, validate, and monitor models continuously.
  • Enable collaboration between data scientists, ML engineers, and operations teams.

You can think of it as DevOps for machine learning, but with unique complexities such as model drift, data quality, and reproducibility.

There are 3 steps for MLOps:

  • MLOps level 0: Manual process
  • MLOps level 1: ML pipeline automation
  • MLOps level 2: CI/CD pipeline automation

Ref: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Motivation: Why is MLOps Important?

Without MLOps:

  • Models stay in notebooks, never reaching production.
  • Retraining models is manual and error-prone.
  • No traceability of data or models.
  • Hard to scale experiments or collaborate.

With MLOps:

  • Models can be deployed reliably and quickly.
  • Enables CI/CD for ML pipelines.
  • Provides traceability and governance.
  • Ensures monitoring and retraining loops are in place.

Companies need MLOps to turn their ML investments into real, scalable, and maintainable products.

MLOps Architecture

MLOps level 2: CI/CD pipeline automation

Image description

Ref: https://ml-ops.org/content/mlops-principles

  • Orchestrated Experimentation: Collaborative, tracked experimentation:
    • Data Validation
    • Data Preparation
    • Model Training
    • Model Evaluation
    • Model Validation
  • CI 1 - ML Train, Test, Package: Automating code integration and model packaging. "You build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage." (Ref: Google)
  • CD 1 - Pipeline Deployment: Deploying pipelines into various environments. "You deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model." (Ref: Google)
  • CI 2 - Full Automated: ML Production Training Pipeline: End-to-end pipeline (ETL → Train → Validate) for production."The pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry." (Ref: Google)
    • Data Extraction
    • Data Validation
    • Data Preparation
    • Model Training
    • Model Evaluation
    • Model Validation
  • CD 2 - Model Production Serving: Exposing models as APIs or batch processors.
  • Monitoring & Retraining: Keeping models accurate and fresh.

Difference MLOps level 1 & 2: Implementing CI/CD pipeline for the Orchestrated Experimentation. In level 1, orchestrated experiment run manually, not in pipelines.

Above figure shows the steps and tools for MLOps. Now, we’re focusing on each part with different subtitles. Let’s dive..

Feature Store

A Feature Store is a centralized repository for storing, managing, and sharing features used in machine learning models both for training and real-time inference.

  • Feature: A feature is an individual measurable property or characteristic used by a machine learning model to make predictions.
  • Raw Data: Raw data is the original, unprocessed information collected from various sources, which may need cleaning or transformation before being used.
  • Feature vs Raw Data: Features are derived from raw data through preprocessing steps to make the data suitable for training machine learning models.

Why it's important:

  • Ensures consistency between training and inference.
  • Promotes feature reuse across teams.
  • Reduces duplication and errors in feature engineering.
Platform Tool
On-prem Feast, DVC
AWS SageMaker Feature Store
GCP Vertex AI Feature Store
Azure Azure ML Feature Store

Orchestrated Experiment

This is the initial phase where data scientists run experiments to analyze data, test models, and record experiment metadata.

Platform Tool
On-prem JupyterLab + MLFlow (or Weights & Biases)
AWS SageMaker Studio + SageMaker Experiments
GCP Vertex AI Workbench + Vertex AI Experiments
Azure Azure ML Studio + Azure ML SDK

CI 1 - ML Train, Test, Package

This step handles Continuous Integration for ML Code: Testing code, building ML components (models), and packaging them into containers or artifacts.

Platform Tool
On-prem GitLab/GitHub Pipelines (or Jenkins) + Container + MLFlow (or Weights & Biases)
AWS CodeBuild + CodePipeline + ECR
GCP Cloud Build + Artifact Registry
Azure Azure DevOps Pipelines + Container Registry

CD 1 - Pipeline Deployment

Deploying the ML pipeline (not the model yet) — to dev/staging/prod environments.

Platform Tool
On-prem GitLab/GitHub Pipelines + Container + MLFlow (or Weights & Biases)
AWS SageMaker Pipelines + CodePipeline + Container
GCP Vertex AI Pipelines + Cloud Composer + Container
Azure Azure ML Pipelines + Azure DevOps + Container

ML Metadata Store - Experiment Tracking

An ML Metadata Store records metadata generated during the ML lifecycle — data lineage, model hyperparameters, training run IDs, evaluation metrics, and more.

Why it's important:

  • Enables experiment reproducibility.
  • Helps trace the origin and evolution of models.
  • Supports automated model governance and auditing.
Platform Tool
On-prem MLFlow (or Weights & Biases)
AWS SageMaker Experiments
GCP Vertex AI Experiments
Azure Azure ML Experiments & Run History

CI 2 - Full Automated: ML Production Training Pipeline

  • Full ML pipeline: Data extraction, validation, preprocessing, training, evaluation, validation.
  • In ML, code doesn't always change, but data or features do — and that alone can trigger new training runs. So having separate pipelines for:
    • Pipeline deployment (i.e., re-train when data changes).
    • Build & test a training pipeline (which depends on fresh data, features, and compute)
Platform Tool
On-prem Airflow + GitLab/GitHub Pipelines + MLFlow + Container
AWS SageMaker Pipelines + Container
GCP Vertex AI Pipelines + Container
Azure Azure ML Pipelines + Container

Model Registry

A Model Registry is like a version-controlled catalog of models. It tracks model versions, metadata, stage (e.g., staging, production), and lifecycle.

Why it's important:

  • Keeps track of all trained models.
  • Manages promotion between environments (dev → staging → prod).
  • Enables collaboration between data science and operations teams.
Platform Tool
On-prem MLFlow Model Registry
AWS SageMaker Model Registry
GCP Vertex AI Model Registry
Azure Azure ML Model Registry

CD 2 - Model Production Serving

  • Serving the trained model in production (e.g., real-time API, batch inference).
  • In ML, code doesn't always change, but data or features do — and that alone can trigger new training runs. So having separate pipelines for:
    • Model serving (deploy newly trained model for inference) is crucial.
Platform Tools
On-prem MLFlow Server, KServe, Seldon Core, TorchServe
AWS SageMaker Endpoints
GCP Vertex AI Prediction
Azure Azure ML Endpoints

Performance Monitoring

Monitoring model performance post-deployment for drift, accuracy, latency, etc. It may also trigger retraining.

Platform Tools
On-prem Prometheus + Grafana, MLFlow
AWS SageMaker Model Monitor
GCP Vertex AI Model Monitoring
Azure Azure Monitor + Azure ML Data Drift Monitor

MLOps Tools Comparison Table

Stage On-Premise (Open Source) AWS GCP Azure
Feature Store Feast, DVC SageMaker Feature Store Vertex AI Feature Store Azure Feature Store (in preview)
Orchestrated Experiment JupyterLab, MLFlow (or Weights & Biases) SageMaker Pipelines Vertex AI Pipelines Azure ML Pipelines
CI 1 - ML Train, Test, Package GitLab/GitHub Pipelines (or Jenkins) + Container + MLFlow (or Weights & Biases) AWS CodeBuild + CodePipeline + ECR + Container Cloud Build + Container Azure DevOps + GitHub Actions + Container
CD 1 – Pipeline Deployment GitLab/GitHub Pipelines + Container + MLFlow (or Weights & Biases) SageMaker Pipelines + CodePipeline + Container Vertex AI Pipelines + Cloud Composer + Container Azure ML Pipelines + Azure DevOps + Container
ML Metadata Store – Experiment Tracking MLFlow (or Weights & Biases) SageMaker Experiments Vertex AI Experiments Azure ML Tracking
CI 2 - Full Automated: ML Production Training Pipeline Airflow + GitLab/GitHub Pipelines + MLFlow + Container SageMaker Pipelines + Container Vertex AI Pipelines + Container Azure ML Pipelines + Container
Model Registry MLFlow Registry, ModelDB SageMaker Model Registry Vertex AI Model Registry Azure ML Model Registry
CD 2 - Model Production Serving MLFlow Server, KServe, Seldon Core, TorchServe SageMaker Endpoints Vertex AI Prediction Azure ML Endpoints
Performance Monitoring Prometheus + Grafana, MLFlow SageMaker Model Monitor Vertex AI Model Monitoring Azure Data Collector + Monitor

Best Practices in MLOps

  • Track Everything: Data, ML Code, ML Models, Evaluation Metrics.
  • Use Feature Stores: For consistency across training and inference.
  • Automate Pipelines: Reduce manual errors and increase reproducibility.
  • CI/CD for ML: Use automated tests and deployments for ML code and models.
  • Monitor Continuously: Detect drift, anomalies, or failures early.
  • Model Governance: Ensure auditability, compliance, and security.
  • Decouple Components: Modularize training, serving, monitoring.

Conclusion

In today’s AI-driven world, building a model is no longer the hard part, deploying, maintaining, and scaling it is. That’s where MLOpscomes in. It bridges the gap between data science and operations, enabling teams to move from experimentation to production efficiently, reliably, and repeatedly.

Why MLOps Is Important

  • Consistency: Ensures your model behaves the same in training and production.
  • Reproducibility: Tracks experiments, datasets, and model versions.
  • Speed: Automates repetitive steps like training, testing, and deployment.
  • Monitoring: Detects model drift, performance issues, and data quality problems.
  • Collaboration: Connects data scientists, ML engineers, and DevOps through a shared workflow

If you found this post interesting, I’d love to hear your thoughts in the blog post comments. Feel free to share your reactions or leave a comment. I truly value your input and engagement 😉

References

Your comments 🤔

  • Which tools are you using? Please mention in the comment your experience, your interest?
  • What are you thinking about MLOps?

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (7)

Collapse
 
nevodavid profile image
Nevo David

honestly i always get confused with all the pipelines and model stuff, but this helps a bit - i still gotta figure out which tools fit me best tho

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer • Edited

Basically, 1st CI/CD to train better ML Model while development/data science phase; 2nd one is if there is concept/data drift (change) in prod, to build/train/deploy again automatically 😊

Collapse
 
esthernaisimoi profile image
ESTHER NAISIMOI • Edited

incredible breakdown

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer

thanks :)

Collapse
 
dotallio profile image
Dotallio

Really appreciate the hands-on comparison of MLOps tools across the different cloudsI've bounced between SageMaker and Vertex AI a lot, so seeing them side by side is super helpful.

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer

thanks, I tried to cover the MLOps tools on different platforms..

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer • Edited

In my experience, MLOps enables fast experimentation, automated training, and reliable deployment, which are crucial for getting models into production efficiently, quickly, reliably.

Sentry image

Make it make sense

Make sense of fixing your code with straight-forward application monitoring.

Start debugging →