Forem: Vijay Govindaraja

Tuning ML hyperparameters with a swarm optimizer inspired by parrot behavior

Vijay Govindaraja — Sun, 12 Apr 2026 18:39:36 +0000

When you train a neural network or any ML model, performance depends heavily on hyperparameters — learning rate, batch size, number of layers, regularization strength. Finding good values is expensive because each evaluation means training a model end to end.

The standard approaches each have tradeoffs:

Grid search tries every combination on a predefined grid. It works for 2-3 parameters but scales exponentially. A 5-parameter search with 10 values each is 100,000 evaluations. Not practical.

Random search samples uniformly and usually finds a decent region faster than grid search. But it has no memory — it doesn't learn from previous evaluations to focus on promising areas.

Bayesian optimization (what Optuna and Hyperopt use under the hood) builds a surrogate model of the objective and samples where improvement is most likely. Very sample-efficient in low dimensions. But the surrogate model itself becomes expensive to fit in high-dimensional spaces, and it can get stuck when the objective surface has many local optima.

Swarm methods like PSO (Particle Swarm Optimization) maintain a population of candidate solutions that share information about good regions. They scale better to high dimensions than Bayesian methods. The failure mode is premature convergence: every particle gets pulled toward the same global best, the swarm loses diversity, and it can't escape a local optimum once it's trapped.

That last problem — premature convergence in swarm methods — is what I was trying to address.

The idea behind MSPO

Standard PSO gives every particle the same update rule every iteration: move toward your personal best, move toward the swarm's global best, add some inertia. The weights change over time, but the type of movement is always the same. This makes the swarm predictable, which is exactly the wrong property when you're trying to explore a complex landscape.

MSPO (Multi-Strategy Parrot Optimizer) takes a different approach: instead of one update rule, there are four. Each iteration, each agent in the swarm independently and randomly picks one of the four behaviors. Some agents explore aggressively, some exploit locally, some follow the crowd, some deliberately go against it. The swarm maintains diversity because different agents are doing genuinely different things at the same time.

The four behaviors are loosely inspired by how parrots behave in groups:

Foraging. The agent takes a Levy flight — a random step with a heavy-tailed distribution, meaning it usually takes small steps but occasionally jumps far. The step is scaled by the distance to the global best and pulled toward the population mean. This is the exploration behavior. Early in the run the mean-pull is strong (the flock stays together); late in the run it fades and agents explore independently.

Staying. The agent drifts toward the global best with some random noise that shrinks over time. This is pure exploitation — fine-tuning around a known good region.

Communicating. A coin flip. Half the time the agent moves toward the group mean (flocking). The other half it moves in a random direction with a decaying step size (going off alone). This creates a mix of conformist and nonconformist behavior in every iteration.

Fear of strangers. The position update combines attraction toward the best solution with repulsion from the current position, modulated by a chaotic sequence from a Tent map. The chaos prevents the swarm from settling into a fixed pattern during the later stages of optimization.

Three additional components support the behaviors:

Sobol initialization: instead of scattering agents randomly, a low-discrepancy sequence covers the search space more uniformly from the start. Less wasted exploration in the opening iterations.
Exponentially decaying inertia weight: starts high (broad jumps) and decays toward a low floor (small refinements). The algorithm naturally transitions from exploration to exploitation without manual tuning.
Parametric Tent map: generates a deterministic but unpredictable chaotic sequence that modulates the "fear of strangers" behavior. Structured chaos is better than pure randomness for escaping local optima.

How to use it

Install:

pip install mspo

Basic usage — minimize any function that takes a numpy array and returns a float:

import numpy as np
from mspo import MSPO

def objective(x):
    return float(np.sum(x ** 2))

opt = MSPO(
    objective=objective,
    bounds=[(-100, 100)] * 10,
    n_parrots=30,
    max_iter=1000,
    seed=42,
)
result = opt.run()

print(result.best_value)     # ~1.6e-12
print(result.best_params)    # ~zeros

Tuning a classifier

Here's a more realistic example — tuning a random forest on your own dataset:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from mspo import MSPO

def objective(params):
    n_estimators = int(params[0])
    max_depth = int(params[1])
    min_samples_split = int(params[2])

    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        random_state=0,
    )
    score = cross_val_score(clf, X_train, y_train, cv=3).mean()
    return -score  # MSPO minimizes, so negate accuracy

opt = MSPO(
    objective=objective,
    bounds=[
        (10, 500),    # n_estimators
        (2, 50),      # max_depth
        (2, 20),      # min_samples_split
    ],
    n_parrots=15,
    max_iter=50,
    seed=0,
)
result = opt.run()
print(f"Best CV accuracy: {-result.best_value:.4f}")

A few practical notes:

MSPO minimizes. If you want to maximize accuracy, negate the return value.
Integer parameters need to be cast with int() inside your objective. The optimizer works in continuous space; you handle the rounding.
The seed parameter makes runs fully reproducible — same seed, same result, every time. Use this when comparing different objective functions or parameter bounds.
n_parrots and max_iter control the computational budget. Total evaluations = n_parrots × (1 + max_iter). For expensive objectives (each evaluation takes minutes), use fewer parrots and iterations. For cheap objectives (milliseconds per eval), you can afford more.

Tuning a neural network

Same pattern works for PyTorch or TensorFlow — just wrap your training loop:

import torch
from mspo import MSPO

def objective(params):
    lr = 10 ** params[0]          # log-scale: params[0] in [-5, -1]
    weight_decay = 10 ** params[1] # log-scale: params[1] in [-6, -2]
    batch_size = int(params[2])

    model = MyModel()
    optimizer = torch.optim.Adam(
        model.parameters(),
        lr=lr,
        weight_decay=weight_decay,
    )
    val_loss = train_and_evaluate(model, optimizer, batch_size)
    return val_loss

opt = MSPO(
    objective=objective,
    bounds=[
        (-5, -1),       # log10(learning_rate)
        (-6, -2),       # log10(weight_decay)
        (16, 256),      # batch_size
    ],
    n_parrots=20,
    max_iter=30,
    seed=42,
)
result = opt.run()

Note the log-scale trick for learning rate and weight decay. These parameters span several orders of magnitude, so searching in log space gives the optimizer a more uniform landscape to work with.

Does it actually work?

I validated against the official CEC 2022 benchmark suite — 12 functions (unimodal, multimodal, hybrid, composition) with the published shift vectors and rotation matrices from the competition organizers. This is the standard test used in the metaheuristics community to compare optimizers on a level playing field.

Setup: 30 agents, 1000 iterations, 10 dimensions, 30 independent runs per function. Compared against canonical PSO (constriction coefficients) and random search (same evaluation budget).

Function	Type	MSPO	PSO	Random
F1 Zakharov	unimodal	45.06	0.00	5436
F2 Rosenbrock	unimodal	10.05	62.98	256
F3 Schaffer F7	multimodal	1.7e-04	0.03	0.20
F4 Rastrigin	multimodal	38.00	51.00	87.90
F5 Levy	multimodal	0.55	0.67	2.47
F6 Hybrid 1	hybrid	39637	58895	10.4M
F7 Hybrid 2	hybrid	60.02	29.50	214
F8 Hybrid 3	hybrid	472	298	1061
F9 Composition 1	composition	398	426	516
F10 Composition 2	composition	-1254	29.57	-366
F11 Composition 3	composition	2.32	79.14	117
F12 Composition 4	composition	165	170	242

Values are median error (f(x) - f*) over 30 runs. Lower is better.

MSPO wins on 9 of 12 functions. PSO wins on Zakharov (a smooth unimodal function where simple attraction-to-best is enough) and two hybrid functions. MSPO's advantage shows up most clearly on the composition functions (F9-F12), where the landscape has multiple overlapping basins — exactly where behavioral diversity matters.

Where it doesn't win: purely unimodal problems where the shortest path to the minimum is a straight line. PSO's simple "move toward best" is hard to beat when there are no local optima to escape from. If your hyperparameter landscape is likely smooth and unimodal, PSO or Bayesian optimization may be more appropriate.

Adapting it to your problem

Some guidelines based on what I've found works:

When MSPO is a good fit:

Hyperparameter spaces with 3+ dimensions where grid search is too expensive
Objectives with multiple local optima (most neural network training landscapes)
Situations where you can afford 500-30000 evaluations but need better results than random search
When you want reproducible tuning (set the seed)

When something else might be better:

Very expensive objectives where you can only afford <100 evaluations — use Bayesian optimization (Optuna)
Purely combinatorial/discrete spaces — MSPO works in continuous space, so you'd need to round parameters. Evolutionary methods designed for discrete spaces may be more natural.
Low-dimensional smooth problems (1-2 parameters) — grid search or scipy.optimize will be simpler and effective

Tuning MSPO itself:
The default parameters (from the paper) work well as a starting point. The main knobs:

n_parrots: more agents = better coverage but more evaluations per iteration. 20-50 is a reasonable range.
max_iter: more iterations = finer convergence. 200-1000 depending on your budget.
seed: always set this for reproducibility. Run with 3-5 different seeds and take the best if you want robustness.

The inertia weight, Tent map parameters, and Levy flight beta all have paper-validated defaults and I haven't found a case where changing them helps significantly. Leave them alone unless you have a specific reason.

Source code and reproduction

Everything is on GitHub with 73 unit tests:

Repo: github.com/vijaygovindaraja/mspo
PyPI: pip install mspo

To reproduce the CEC 2022 benchmarks:

pip install mspo[benchmark]
python benchmarks/run_cec2022.py --quick  # ~2 min smoke test
python benchmarks/run_cec2022.py          # full run, ~2.5 hours

The package is 6 source files, each under 200 lines. If you want to understand or modify the algorithm, start with mspo/behaviors.py (the four update rules) and mspo/optimizer.py (the main loop).

Paper: Govindarajan, V. (2025). MSPO: A machine learning hyperparameter optimization method for enhanced breast cancer image classification. Digital Health.

How I Built Aegis-5: An Ensemble Framework That Detects 99.98% of IIoT Intrusions

Vijay Govindaraja — Sun, 29 Mar 2026 03:54:35 +0000

Factory floors in 2026 look nothing like they did a decade ago. Robots collaborate with humans, sensors talk to cloud systems, and every machine is a node on the network. This is Industry 5.0 — and it's a massive attack surface.

I spent the last year building Aegis-5, a hybrid ensemble framework for intrusion detection in these environments. The work was published in ACM Transactions on Autonomous and Adaptive Systems, and I've open-sourced the full implementation. Here's how it works and why existing approaches fall short.

The Problem

Industrial IoT networks generate diverse traffic — normal SCADA commands, sensor telemetry, actuator signals — alongside attack patterns that look increasingly like legitimate traffic. Traditional IDSs struggle here because:

Single classifiers can't generalize across the wide variety of IIoT attack types (DDoS, reconnaissance, spoofing, botnet C2, etc.)
Static models degrade as traffic patterns shift during production cycles
Zero-day attacks bypass signature-based and even some ML-based systems
Class imbalance — in real IIoT traffic, some attack types are extremely rare

The Idea Behind Aegis-5

Instead of betting on one classifier, Aegis-5 combines five fundamentally different learners and lets them vote — but the voting isn't equal. Each classifier's vote is weighted dynamically based on how well it's been performing on each specific attack class in recent predictions.

The five classifiers:

Random Forest — handles high-dimensional feature spaces well
Gradient Boosting — strong on structured/tabular data
XGBoost — efficient gradient boosting with regularization
SVM — effective decision boundaries in transformed feature space
KNN — captures local neighborhood patterns

Each brings a different inductive bias. That diversity is the whole point.

Dynamic Weighting: The Core Innovation

Here's where Aegis-5 diverges from standard ensembles. Instead of fixed weights or simple majority voting, we maintain a sliding window of the last K=1000 predictions for each classifier and compute per-class F1 scores in real time.

The weight for classifier i on class c is:

w_i,c = exp(beta * F1_i,c) / sum_j(exp(beta * F1_j,c))

This is just softmax with a temperature parameter (beta=2.0). When a classifier is nailing a specific attack type, its weight for that class goes up. When it's struggling, the ensemble naturally shifts trust to the classifiers that are performing better.

In Python:

class DynamicWeightManager:
    def __init__(self, n_classifiers, n_classes, window_size=1000, beta=2.0):
        self.windows = [deque(maxlen=window_size) for _ in range(n_classifiers)]
        self.weights = np.ones((n_classifiers, n_classes)) / n_classifiers
        self.beta = beta

    def _recompute_weights(self):
        f1_scores = np.ones((self.n_classifiers, self.n_classes)) * 0.5

        for i in range(self.n_classifiers):
            if len(self.windows[i]) == 0:
                continue
            records = list(self.windows[i])
            y_true = [r[0] for r in records]
            y_pred = [r[1] for r in records]
            per_class_f1 = f1_score(
                y_true, y_pred, average=None,
                labels=list(range(self.n_classes)), zero_division=0.0
            )
            f1_scores[i, :len(per_class_f1)] = per_class_f1

        for c in range(self.n_classes):
            scores = self.beta * f1_scores[:, c]
            exp_scores = np.exp(scores - np.max(scores))
            self.weights[:, c] = exp_scores / exp_scores.sum()

The Meta-Learner Layer

On top of the dynamically-weighted base predictions, a Logistic Regression meta-learner synthesizes the final output. It takes the weighted probability vectors from all five classifiers as input features and learns the optimal combination.

But here's the twist — we don't blindly trust the meta-learner either.

Hybrid Voting Protocol

The final prediction uses a confidence threshold (tau=0.95):

High confidence (meta-learner probability >= tau): use soft voting with the meta-learner's output
Low confidence: fall back to hard voting (weighted majority) across all five classifiers

This hybrid approach means the system is aggressive when it's confident and conservative when it's uncertain — exactly what you want in a security-critical environment.

def _hybrid_predict(self, X):
    meta_proba = self.meta_learner.predict_proba(meta_features)
    max_confidence = meta_proba.max(axis=1)

    predictions = np.empty(n_samples, dtype=int)
    high_conf = max_confidence >= self.confidence_threshold
    predictions[high_conf] = meta_proba[high_conf].argmax(axis=1)

    # Hard voting fallback for low-confidence samples
    for idx in np.where(~high_conf)[0]:
        votes = {}
        for i, clf in enumerate(self.classifiers):
            pred = clf.predict(X[idx:idx+1])[0]
            weight = self.weight_manager.get_weights()[i, pred]
            votes[pred] = votes.get(pred, 0) + weight
        predictions[idx] = max(votes, key=votes.get)

    return predictions

Preprocessing Pipeline

IIoT data is messy. Our pipeline:

Median imputation for missing values (robust to outliers from sensor noise)
StandardScaler normalization
ANOVA F-test + Recursive Feature Elimination with Cross-Validation (RFECV) — keeps only statistically significant features
PCA for dimensionality reduction
SMOTE to handle class imbalance (rare attack types)

Results

We evaluated on two benchmark IIoT datasets:

Dataset	Accuracy	Precision	Recall	F1-Score
IoT-23	99.98%	99.97%	99.96%	99.96%
CIC-IoT 2023	99.95%	99.93%	99.92%	99.93%

These numbers beat prior state-of-the-art approaches on both datasets. More importantly, the per-class metrics show strong performance even on rare attack types — which is where most single-classifier systems fail.

Try It Yourself

The full implementation is open-source:

git clone https://github.com/vijaygovindaraja/Aegis5.git
cd Aegis5
pip install -r requirements.txt
python demo.py

Or use it in your own project:

from aegis5 import Aegis5

model = Aegis5(
    confidence_threshold=0.95,
    beta=2.0,
    use_feature_selection=True,
    use_pca=True,
    use_smote=True
)

model.fit(X_train, y_train)
results = model.evaluate(X_test, y_test)
print(f"Accuracy: {results['accuracy']:.4f}")

What I Learned

Building Aegis-5 reinforced a few things for me:

Diversity beats complexity. Five relatively simple classifiers with smart weighting outperformed deeper, more complex individual models. The key is that each classifier fails differently — and the ensemble exploits that.

Adaptive systems matter in production. Static models decay. The sliding window approach means Aegis-5 adapts as traffic patterns change without retraining from scratch.

Hybrid strategies beat pure strategies. The soft/hard voting hybrid outperformed both pure soft voting and pure hard voting. Knowing when to be confident and when to be cautious is underrated in ML system design.

Paper

The full paper is published in ACM Transactions on Autonomous and Adaptive Systems:

Govindarajan, V., Ahmed, F., Faheem, Z.B., Bilal, M., Ayadi, M., & Ali, J. (2026). Aegis-5: A Hybrid Ensemble Framework for Intrusion Detection in Industry 5.0 Driven Smart Manufacturing Environment. ACM TAAS. DOI: 10.1145/3787224

If you're working on IIoT security or ensemble methods, I'd love to hear your thoughts. Drop a comment or open an issue on the GitHub repo.

I Built a Free WCAG Accessibility Audit CLI for Government Teams

Vijay Govindaraja — Sat, 28 Mar 2026 06:57:43 +0000

Every government website in the US is required to meet Section 508 accessibility standards. Most commercial tools cost hundreds per month. So I built an open source alternative.
**

The Problem **

If you're a developer working on a .gov site, you need to verify WCAG compliance before every deploy. Your options

are:

Manual testing — slow, inconsistent, doesn't scale
Commercial tools (Siteimprove, Level Access) — $500+/month
Browser extensions (axe DevTools) — great for one page, but can't scan a whole site or run in CI

I wanted something that:

Runs from the terminal
Scans entire sites via sitemap
Outputs JSON/CSV for CI pipelines
Costs nothing

wcag-audit

npx wcag-audit scan https://your-site.gov

That's it. No API keys, no account, no config files.

It launches a headless browser, injects https://github.com/dequelabs/axe-core (the same engine Google and Microsoft

use), and returns a report with every WCAG violation, the affected elements, and how to fix them.

What the output looks like

════════════════════════════════════════════════════
WCAG ACCESSIBILITY AUDIT REPORT

════════════════════════════════════════════════════

URL:      https://example.gov                                                                                     
Title:    Example Government Site
Level:    WCAG AA

Critical: 2  Serious: 5  Moderate: 8  Minor: 3                                                                      

[critical] image-alt: Images must have alternate text                                                               
  WCAG: wcag2a, wcag111                                                                                           
  Elements affected: 4
    → img.hero-banner
      Fix: Element does not have an alt attribute

[serious] color-contrast: Elements must meet minimum
  color contrast ratio                                                                                              
  WCAG: wcag2aa, wcag143                                                                                          
  Elements affected: 12
    → .nav-link
      Fix: Element has insufficient color contrast

════════════════════════════════════════════════════

Every violation tells you:

The rule that failed
The severity (critical, serious, moderate, minor)
Which WCAG criterion it violates
The CSS selector of the affected element
How to fix it

Scan an Entire Site

wcag-audit crawl https://example.gov/sitemap.xml --max-pages 50

Reads the sitemap, scans each page, and produces a consolidated report.

Drop it into CI

The CLI exits with code 1 when violations are found:

# GitHub Actions

name: Accessibility Audit run: npx wcag-audit scan ${{ env.DEPLOY_URL }} --level AA

Now accessibility is enforced on every deploy. No violations, no merge.

JSON and CSV for Reporting

# JSON for programmatic use

wcag-audit scan https://example.gov --format json --output report.json

# CSV for spreadsheets and compliance reports
wcag-audit scan https://example.gov --format csv --output report.csv

The CSV is designed for compliance teams who need to track violations in spreadsheets and produce audit reports for

management.

WCAG Levels

# Level A (minimum)
wcag-audit scan https://example.gov --level A

# Level AA (required for US federal, most common)

wcag-audit scan https://example.gov --level AA

# Level AAA (strictest)

wcag-audit scan https://example.gov --level AAA

Most government sites need AA. The tool defaults to AA.

Use it as a Library

const { scanUrl, formatTextReport } = require('wcag-audit');

const results = await scanUrl('https://example.gov', {
level: 'AA',
viewport: { width: 1280, height: 720 },
});

if (results.summary.impactBreakdown.critical > 0) {

console.error('Critical accessibility violations found!');
process.exit(1);

}

Why I Built This

I've been contributing to accessibility-related projects across multiple government agencies — the US Web Design

System (USWDS), the UK's GOV.UK Frontend, Singapore's GovTech accessibility tool (oobee), and Grafana's
colorblind-safe palette. Every one of these projects deals with the same problem: making sure websites are accessible
to everyone.

The tooling gap was obvious. Developers who care about accessibility shouldn't need to pay for the privilege of

testing it.

Tech Stack

https://github.com/dequelabs/axe-core — the accessibility engine used by Google, Microsoft, and government agencies worldwide
https://pptr.dev/ — headless Chrome for reliable page rendering
Node.js — runs anywhere, no system dependencies beyond Chrome

Install

npm install -g wcag-audit

Or try without installing:

npx wcag-audit scan https://example.com

Links

The tool is MIT licensed. PRs welcome. If you work on a government site and this saves you time, I'd love to hear
about it.