Forem: Austin Deyan

How I Built a Full-Stack ML App (and Fixed a 3GB Docker Image) 🐳

Austin Deyan — Thu, 01 Jan 2026 17:05:26 +0000

Most Machine Learning tutorials have a fatal flaw: They stop at the Notebook.

You train a model, get a nice accuracy score, and then... nothing. The model sits in a .ipynb file gathering digital dust.

I wanted to change that. I recently built an end-to-end Customer Conversion System that takes raw data, predicts purchasing behavior, and triggers automated marketing actions via a live API.

Here is the journey from "Localhost" to "Production"—including how I accidentally built a 3.3GB Docker container and how I slashed it by 65%.

🏗️ The Tech Stack

We aren't just fitting curves; we are shipping code.

Model: XGBoost (Classification + Regression)

Backend: Flask (Python)

Container: Docker

Cloud: Google Cloud Run (Serverless)

Frontend: Streamlit

Phase 1: The Logic (Beyond "0.85 Accuracy")

A raw probability score isn't actionable. Marketing teams don't want to know "User 123 has a 0.82 score." They want to know what to do.

I wrapped my XGBoost model in a "Decision Engine" function inside Python:

Python

def determine_action(prob, days_to_buy, value):
    # High probability, High spender
    if prob > 0.8 and value > 2000:
        return f"VIP ALERT: Send Early Access Catalog. (Expected buy in {int(days_to_buy)} days)"

    # High probability, Low spender
    elif prob > 0.8:
        return "PROMO: Send 'Bundle Discount' to increase basket size."

    # Low probability, High historic value (Churn Risk)
    elif prob < 0.3 and value > 2000:
        return "RISK: Trigger Personal Outreach Call."

    else:
        return "NURTURE: Add to General Newsletter."

Now the API returns business strategy, not just math.

Phase 2: The Docker Nightmare 🐳

This was the biggest hurdle. I wrote a standard Dockerfile to wrap up my Flask API.

I ran docker build, went to grab coffee, came back, and saw this:

Bash

Successfully built...
Image size: 3.36 GB
3.36 GB. For a simple API? That’s unacceptable. It makes deployment slow and storage expensive.

🕵️‍♂️ The Investigation

I ran a deep scan inside the container to see where the fat was hiding:

Bash

docker run --rm my-app du -ah /usr/local/lib/python3.9/site-packages | sort -rh | head -n 10

The output was shocking:

900MB+ in nvidia/ drivers.

1GB+ in my local .venv folder that I accidentally copied over.

🛠️ The Fixes

The .dockerignore File I was lazy and didn't create a .dockerignore file, so Docker copied my local virtual environment (.venv), git history, and raw data into the image.

Fix: Added .venv, .git, and data/ to .dockerignore.

The XGBoost/NVIDIA Trap It turns out that pip install xgboost (latest version) often bundles massive NVIDIA CUDA drivers, even if you are only running on a CPU.

Fix: I pinned the version to a lighter release in requirements.txt:

xgboost==1.7.6

The Result: The image dropped from 3.36GB -> 1.2GB. Much better.

Phase 3: Serverless Deployment (Google Cloud Run)

I love Cloud Run for side projects. You give it a container, and it gives you a HTTPS URL. It scales to zero when no one is using it, meaning it costs $0/month for low traffic.

Deploying was just three commands:

Bash

1. Tag the image

docker tag conversion-api gcr.io/my-project/conversion-api

2. Push to Google Container Registry

docker push gcr.io/my-project/conversion-api

3. Deploy

gcloud run deploy conversion-service --image gcr.io/my-project/conversion-api --platform managed

Boom. A live API endpoint accessible from anywhere in the world.

Phase 4: The Frontend

To make this usable for non-technical users, I threw together a Streamlit dashboard in about 50 lines of Python.

It connects to the Cloud Run API and provides a UI for testing customer profiles.

📝 Key Takeaways

ML isn't done until it's deployed. A model in a notebook delivers zero value.

Watch your dependencies. pip install is dangerous if you don't check what's being installed. That single XGBoost line cost me 1GB of space.

Context matters. Transforming a probability score into a "Next Best Action" makes your model 10x more valuable to stakeholders.

Have you ever struggled with massive Docker images in Python? Let me know in the comments!

Zero-to-Scale ML: Deploying ONNX Models on Kubernetes with FastAPI and HPA

Austin Deyan — Mon, 15 Dec 2025 18:42:28 +0000

The path to scalable ML deployment requires high-performance APIs and robust orchestration. This post walks through setting up a local, highly available, and auto-scaling inference service using FastAPI for speed and Kind for Kubernetes orchestration.

Phase 1: The FastAPI Inference Service

Our Python service handles ONNX model inference. The critical component for K8s stability is the /health endpoint:

Python

# app.py snippet
# ... model loading logic ...

@app.get("/health")
def health_check():
    # K8s Probes will hit this endpoint frequently
    return {"status": "ok", "model_loaded": True}

# ... /predict endpoint ...

Phase 2: Docker and Kubernetes Deployment

After building the image (clothing-classifier:latest) and loading it into Kind, we define the Deployment. Note the crucial resource constraints and probes.

YAML

# deployment.yaml (Snippet focusing on probes and resources)
        resources:
          requests:
            cpu: "250m"  # For scheduling
            memory: "500Mi"
          limits:
            cpu: "500m"  # To prevent monopolizing the node
            memory: "1Gi"
        livenessProbe:
          httpGet: {path: /health, port: 8000}
          initialDelaySeconds: 5
        readinessProbe:
          httpGet: {path: /health, port: 8000}
          initialDelaySeconds: 5 # Gives time for the ONNX model to load

Phase 3: Implementing Horizontal Pod Autoscaler (HPA)

Scalability is handled by the HPA, which requires the Metrics Server to be running.

YAML

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: clothing-classifier-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: clothing-classifier-deployment
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # Scale up if CPU exceeds 50%

Result: Under load, the HPA dynamically adjusts replica count. This is the definition of elastic, cost-effective MLOps.

Read the full guide here.

If you're deploying any Python API, adopting this pattern for resource management and scaling will save you major headaches down the road.

Serverless Deep Learning: From Notebook to Production with AWS Lambda

Austin Deyan — Mon, 08 Dec 2025 10:02:25 +0000

Training a model in a Jupyter Notebook is satisfying. But deploying it? That's where the headaches usually start. Today, I'm going to show you how to deploy a Keras image classifier using AWS Lambda and TensorFlow Lite.

Why Serverless?

AWS Lambda is "Serverless," meaning you don't manage the OS or hardware. You just upload code. It's cheap because you only pay when your code runs.

The Heavyweight Problem 🏋️

Standard TensorFlow is huge (approx. 1.7 GB). If you try to shove this into a Lambda function, you'll run into storage issues and slow performance.

The Lightweight Solution ⚡

We use TensorFlow Lite. It optimizes the model for inference (prediction) only, stripping out all the training logic.

Step 1: The Handler Code

Your Python script needs a special function to handle the AWS event:

import tflite_runtime.interpreter as tflite
from keras_image_helper import create_preprocessor

interpreter = tflite.Interpreter(model_path='clothing-model.tflite')
interpreter.allocate_tensors()

def lambda_handler(event, context):
    url = event['url']
    # ... preprocessing and inference logic ...
    return result

Step 2: The Dockerfile

We use Docker to package our dependencies. Crucial Tip: When installing the TF-Lite runtime from a URL, ensure you use the raw version of the link, or pip will throw a BadZipFile error.

Step 3: Deploy with Serverless Framework

Instead of clicking buttons in the AWS console, we can use a serverless.yml file to describe our infrastructure:

service: clothing-model
provider:
  name: aws
  ecr:
    images:
      appimage:
        path: ./
functions:
  predict:
    image:
      name: appimage
    events:
      - http:
          path: predict
          method: post

Running serverless deploy handles the Docker build, ECR upload, and Lambda creation automatically!

Happy coding!

PyTorch in Practice: Engineering a Custom CNN for Hair Texture Classification

Austin Deyan — Mon, 01 Dec 2025 19:04:56 +0000

In the current landscape of Computer Vision, the default move is often Transfer Learning—taking a massive model like ResNet50 and fine-tuning it. While effective, this often abstracts away the fundamental mechanics of how a network actually "sees" texture.

For my latest project, I decided to build a Convolutional Neural Network (CNN) entirely from scratch using PyTorch. My goal? To build a binary classifier capable of distinguishing between hair textures (e.g., Curly vs. Straight) using the Kaggle Hair Type dataset.

Here is a look under the hood of the architecture and the engineering decisions I made.

1. The Data Pipeline: Why Augmentation Matters

The input images were standardized to $200 \times 200$ pixels. However, training a model from scratch on a smaller dataset poses a high risk of overfitting—where the model memorizes the images rather than learning the features.

To combat this, I engineered a robust training pipeline using torchvision.transforms.

Instead of feeding the model static images, I applied dynamic transformations:

Random Rotations (50°): To handle different head tilts.
Random Resized Crop: To force the model to look at different scales of the hair strands.
Horizontal Flips: To ensure directional invariance.

Crucially, I kept the Test Set deterministic (only resizing and normalizing) to ensure I had a stable benchmark for evaluation.

2. The Architecture

I opted for a lightweight, shallow architecture to test how much information could be extracted with minimal compute.

The Stack:

Input: (3, 200, 200)
Feature Extraction: A generic convolutional layer (32 filters, $3\times3$ kernel) followed by ReLU activation and $2\times2$ Max Pooling.
Dimensionality Reduction: A Flatten layer converting the 2D feature maps into a vector of over 313,000 features.
Classification Head: A dense hidden layer (64 neurons) leading to a single output neuron.

3. The "Binary" Nuance

Since I designed this as a binary classifier, the output layer and loss function had to be paired perfectly.

I used a Sigmoid activation on the final neuron to squash the output between 0 and 1 (representing probability). Consequently, I utilized Binary Cross Entropy Loss (BCELoss) rather than the standard Cross Entropy used in multi-class problems.

# The Classification Head
self.fc1 = nn.Linear(32 * 99 * 99, 64)
self.fc2 = nn.Linear(64, 1) 
self.sigmoid = nn.Sigmoid()

4. Training for Reproducibility

One of the biggest challenges in ML engineering is reproducibility. To ensure my results weren't just a fluke of random initialization, I strictly seeded the environment:

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

I used Stochastic Gradient Descent (SGD) with a learning rate of 0.002 and momentum of 0.8. I tracked the Median Training Accuracy across epochs to filter out noise and the Mean Test Loss to monitor generalization.

Key Takeaways

Building this from scratch reinforced several core Deep Learning concepts:

Input math is critical: Calculating the exact feature map size after convolution and pooling is necessary to line up the Linear layers.
Data is king: The model performance improved significantly after introducing the RandomResizedCrop augmentation.
Simplicity works: You don't always need a Transformer. For distinct textural differences, a simple CNN is fast, lightweight, and effective.

#MachineLearning #PyTorch #ComputerVision #DeepLearning #DataScience #CNN

🌾 How I Built & Deployed a Crop Yield Prediction API in the Cloud

Austin Deyan — Mon, 17 Nov 2025 23:15:51 +0000

Hey devs! 👋

I just wrapped up a super interesting project and wanted to share the entire journey—wins, fails, and everything in between.

What I Built

An AI-powered crop yield prediction system that:

Predicts harvest yields with 91% accuracy
Serves predictions via REST API
Runs on Google Cloud Run
Has a beautiful web UI

The Stack

Backend:  Python + Flask + Scikit-learn
DevOps:   Docker + Google Cloud Run
Frontend: Vanilla JS + HTML/CSS (keeping it simple!)
ML:       Gradient Boosting Regressor

The Journey (Story Time! 📖)

Week 1: Data Exploration Hell 😅

Started with messy agricultural data. Spent days just cleaning and understanding it. Pro tip: ALWAYS look at your data distributions first!

Week 2: Model Selection Drama 🤖

Trained 7 models. Linear Regression? Terrible. Decision Trees? Overfitting. Random Forest? Better but slow. Gradient Boosting? Chef's kiss 👌

Here's the comparison:

Model               | R² Score | MAE
--------------------|----------|--------
Gradient Boosting   | 0.913    | 0.31
Random Forest       | 0.895    | 0.35
Linear Regression   | 0.623    | 0.89

Week 3: Docker Nightmares 🐳

"Works on my machine" → Real problem.

Issue #1: Model files not loading in container
Solution: Load model at module level, not in if __name__ == '__main__'

Issue #2: CORS blocking requests
Solution: pip install flask-cors saved my life

Week 4: Cloud Deployment Victory! ☁️

Google Cloud Run = Amazing for ML models

Serverless (scales to zero!)
Easy Docker deployment
Built-in HTTPS
Pay per request

Code Snippets

Here's the prediction endpoint (simplified):

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()

    # Preprocessing
    input_df = pd.DataFrame([data])
    input_encoded = pd.get_dummies(input_df)
    input_scaled = scaler.transform(input_encoded)

    # Prediction
    prediction = model.predict(input_scaled)[0]

    return jsonify({
        'predicted_yield': round(prediction, 2),
        'unit': 'tons_per_hectare'
    })

Biggest Learnings

Data > Models: Feature engineering mattered more than model selection
Deployment is Hard: Spend time on DevOps early
UI Matters: Built a simple HTML interface—users loved it
Documentation: Write it as you code, not after!

Try It Yourself

🔗 More info on Medium

What's Next?

Planning to add:

[ ] Time-series forecasting
[ ] Weather API integration
[ ] Mobile app
[ ] Model retraining pipeline

Questions?

Drop them in the comments! Happy to discuss anything about ML deployment, Docker, or agricultural AI! 👇