I Built an AI Skin Disease Detector with Flask, TensorFlow Lite, and Groq — Here's How

Joan — Thu, 26 Mar 2026 00:10:36 +0000

What if you could upload a photo of a skin lesion and get an AI-powered prediction in under 2 seconds — no signup, no data stored, completely free?

That's exactly what I built for my capstone thesis. SKIN is a web app that runs two CNN models: a 7-class skin lesion classifier trained on the dataset, and a binary monkeypox detector. It also uses Groq's Llama 4 Scout to generate plain-language medical explanations for each prediction.

Here's how I built it.

Live Demo | GitHub Repo

The Problem

Skin diseases are one of the most common reasons people visit a doctor, but access to dermatologists is limited in many parts of the world. Early detection of conditions like melanoma can be life-saving, yet most people don't know what to look for.

I wanted to build something that could give people a starting point — not a diagnosis, but an informed nudge to see a doctor.

Important disclaimer: This is an educational tool. It's not a medical device and should never replace a real dermatologist.

The Stack

Layer	Tech
Backend	Flask (Python)
ML Inference	TensorFlow Lite
AI Explanations	Groq API (Llama 4 Scout 17B)
Frontend	Tailwind CSS, Alpine.js
Charts	Chart.js
Deployment	Render (free tier)

I deliberately kept the stack simple. No React, no complex build pipeline. Jinja2 templates with Alpine.js for interactivity and Tailwind for styling. The entire app is a single app.py file.

The Models

Skin Lesion Classifier (HAM10000)

The HAM10000 dataset contains 10,015 dermatoscopic images of 7 types of skin lesions:

akiec — Actinic Keratosis
bcc — Basal Cell Carcinoma
bkl — Benign Keratosis
df — Dermatofibroma
mel — Melanoma
nv — Melanocytic Nevus (Mole)
vasc — Vascular Lesion

I trained a MobileNetV2-based CNN and converted it to TFLite for fast inference. The final model is just 2.7 MB — small enough to load instantly on a free-tier server.

Overall accuracy: 71.64%. Not clinical-grade, but solid for a thesis project. The biggest challenge was class imbalance — melanocytic nevi dominated the dataset (~67% of all images), which made the model biased toward predicting moles.

Monkeypox Detector

A separate binary classifier that distinguishes monkeypox lesions from other skin conditions. This one hits 95% accuracy — binary problems are inherently easier, and the visual features of monkeypox are quite distinct.

The Architecture

Here's the interesting part. The app loads TFLite models at startup and keeps them in memory:

def get_model(model_type: str):
    global loaded_models

    if model_type not in loaded_models:
        interpreter = tf.lite.Interpreter(model_path=model_path, num_threads=4)
        interpreter.resize_tensor_input(
            interpreter.get_input_details()[0]['index'],
            [1, input_size, input_size, 3]
        )
        interpreter.allocate_tensors()

        loaded_models[model_type] = {
            'interpreter': interpreter,
            'input_details': interpreter.get_input_details(),
            'output_details': interpreter.get_output_details(),
            'type': 'tflite'
        }

    return loaded_models[model_type]

First prediction is slow (cold start), but subsequent predictions are near-instant because the interpreter is already allocated.

Privacy by Design

One thing I'm proud of: no image ever touches disk. The uploaded file is read into memory, processed by the model, and discarded:

image_bytes = file.read()
pil_image = Image.open(io.BytesIO(image_bytes))
product = predict_image(pil_image, model_type, image_bytes=image_bytes)
# image_bytes goes out of scope and gets garbage collected

No database. No file system writes. No logging of uploads. If the server crashes, there's zero user data to leak.

Adding AI Explanations with Groq

Raw CNN output like "bcc — 73.2% confidence" isn't useful to most people. So I added Groq's free API to generate plain-language explanations:

def get_medgemma_explanation(image_bytes, cnn_label, confidence):
    prompt = (
        f"A CNN skin disease classifier detected: {cnn_label} "
        f"({confidence:.1f}% confidence).\n"
        "In plain text only, write 3 short paragraphs:\n"
        "1. What this condition is.\n"
        "2. What visual signs to look for.\n"
        "3. Recommended next steps for the patient."
    )
    response = groq_client.chat.completions.create(
        model="meta-llama/llama-4-scout-17b-16e-instruct",
        messages=[{
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
                {"type": "text", "text": prompt},
            ],
        }],
        max_tokens=200,
    )
    return response.choices[0].message.content.strip()

Groq's free tier is fast enough for a demo app — responses come back in ~1 second. The Llama 4 Scout model is multimodal, so it can actually look at the image and correlate the CNN prediction with visual features.

If the API key isn't set, the feature gracefully degrades — no explanation shown, no error.

Security Considerations

Even for a thesis project, I didn't want to cut corners on security:

CSRF protection — manual token-based validation using Python's secrets module (no extra dependencies)
SRI hashes — all CDN scripts have integrity attributes with SHA-384 hashes
Security headers — HSTS, X-Content-Type-Options, X-Frame-Options
Rate limiting — 3 analyses per session to prevent abuse
Input validation — file type whitelist, 10 MB size limit, 4096x4096 pixel cap

@app.after_request
def set_security_headers(response):
    response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
    response.headers['X-Content-Type-Options'] = 'nosniff'
    response.headers['X-Frame-Options'] = 'SAMEORIGIN'
    return response

Deploying on Render (Free Tier)

The app runs on Render's free plan with a single Gunicorn worker. TFLite keeps memory usage low enough to stay within limits:

# render.yaml
services:
  - type: web
    name: skin-disease-detection
    runtime: python
    plan: free
    startCommand: cd frontend && gunicorn -c ../gunicorn_config.py app:app
    healthCheckPath: /health

Cold starts take ~30 seconds (TensorFlow import + model loading), but once warm, predictions are fast.

What I Learned

Class imbalance is the real boss fight. My model is great at detecting moles (85% accuracy) but struggles with rare conditions like dermatofibroma (31%). Oversampling and class weights help, but don't solve it completely.
TFLite is underrated for web apps. Going from a 58 MB H5 model to a 4.8 MB TFLite model with minimal accuracy loss was a game-changer for deployment.
LLM explanations add massive UX value. The Groq integration took ~30 lines of code but transformed the app from "here's a label and a number" to something actually useful.
You don't need React for everything. Alpine.js + Tailwind + Jinja2 gave me a modern, responsive UI with zero build step. The entire frontend is server-rendered HTML with sprinkles of interactivity.

Try It

Live Demo — upload any skin image and get an instant prediction.

GitHub — star the repo if you found this useful!

The app is open source under MIT. Fork it, improve the models, add new conditions, or swap in your own CNN. PRs welcome.

This project was built as a capstone thesis under the College of Computer and Information Sciences (CCIS). If you're working on something similar or have questions about the training pipeline, drop a comment below!

Forem: Joan