Forem: Tova A

Cleaning Up Complexity: Preprocessing Attribution Maps for Better Evaluation

Tova A — Tue, 10 Feb 2026 11:33:26 +0000

I wanted to compare attribution maps from different XAI methods for vision models, using the Complexity metric from the Quantus library.

The idea was simple:

If a heatmap looks clean and focused, it should have lower complexity than a noisy, scattered one.

In practice, that’s not what happened.
Some maps that were visually sharp and localised got high (Bad) Complexity scores.
Other maps that looked messy or stretched over the whole image got surprisingly low scores.

On the left is Guided Backprop, which spreads activation all over the image.
On the right is Fusion Grad, which is much more sparse and focused on the relevant structures.
But in our initial setup, the Quantus Complexity metric actually gave Fusion Grad a worse (higher) complexity score than Guided Backprop – a clear mismatch between what we see and what the metric reports.

The metric was doing exactly what it was defined to do — but it was reacting to things like scale, padding, resolution, and sign conventions, not just to the “shape” of the explanation.

That’s when it became clear: before evaluating attribution maps, you need to standardise them. Otherwise, you’re mostly comparing formatting differences between methods, not their actual behaviour.

In this post, I’ll show how I preprocess raw attribution maps into a canonical, evaluation-ready form before passing them to Quantus metrics.

At first I tried to “fix” this by using Quantus’s built-in normalize_func, but it didn’t change the ranking in a meaningful way.
The real issue wasn’t the overall scale – it was the pedestal:
both methods produced a low but non-zero activation almost everywhere in the image.
Guided Backprop had a noisy background plus a pedestal, while Fusion Grad had a very thin, sharp signal on top of its own pedestal.
Complexity only sees “how much structure lives above zero”.

If you keep the pedestal, Fusion Grad’s thin signal sits on a wide plateau and ends up looking more complex numerically than the noisier Guided Backprop map.

That’s why the next step was not “better normalisation”, but explicitly removing or reducing the pedestal before computing Complexity.

Baseline-Subtraction Normalization

Instead of relying on the default normalize_func, I implemented a custom one that does two things per attribution map:

Baseline removal (pedestal):
Compute a low percentile (for example, the 5th percentile) and treat it as a baseline.
Subtract this baseline from all values and clamp negatives to zero. This removes the global “pedestal” while keeping the meaningful peaks.
0–1 normalisation:
After baseline removal, rescale the map to the [0, 1] range so that Complexity sees something closer to a probability distribution per sample, instead of raw arbitrary units.

import numpy as np

def baseline_subtraction_norm(attr_map: np.ndarray,
                            baseline_quantile: float = 0.2) -> np.ndarray:
    """
    Normalize an attribution map for evaluation:
    1) subtract a low quantile as baseline (pedestal removal),
    2) clamp to >= 0,
    3) rescale to [0, 1].
    """
    # 1. pedestal removal
    baseline = np.quantile(attr_map, baseline_quantile)
    x = attr_map - baseline
    x = np.clip(x, a_min=0.0, a_max=None)

    # 2. scale to [0, 1]
    max_val = x.max()
    if max_val > 0:
        x = x / max_val
    return x

And than you can simply use quantus complexity metric with your custom normaliza_func:

import quantus

complexity_metric = quantus.Complexity(
    abs=True,
    normalise=True,
    normalise_func=baseline_subtraction_norm,
)

scores = complexity_metric(
    model=model,
    x_batch=x_batch,      # input images
    y_batch=y_batch,      # targets
    a_batch=attr_maps,    # attribution maps
)

Below you can see the value distribution of Fusion Grad before and after pedestal removal.
After subtracting the baseline, most background pixels are exactly zero, and the Complexity metric reacts much more to the actual structure around the defect line and contact.

Best practice
Before applying quantitative metrics to attribution maps, make preprocessing explicit and consistent. Remove method-specific pedestals, standardize sign conventions, and rescale per sample. Otherwise, metrics like Complexity primarily measure implementation artefacts (background mass, padding, resolution) rather than explanatory structure.

Turning Any Model into an XAI-Ready Model: Formats and Gradient Flow

Tova A — Tue, 10 Feb 2026 11:32:50 +0000

This post is based on work done during a joint Applied Materials and Extra-Tech bootcamp, where I built an XAI platform.
I’d like to thank Shmuel Fine (team leader) and Odeliah Movadat (mentor) for their guidance and support throughout the project.

Why Gradient-Based XAI Sometimes “Works” but Tells You Nothing

Gradient-based explainability methods (Grad-CAM, Guided Backprop, Integrated Gradients, etc.) are everywhere.
In tutorials, you call a function, get a pretty heatmap, and move on.
In a real project, it’s different.
I was building an internal XAI platform that needed to work across:

Different ML frameworks (PyTorch, TensorFlow)
Different vision tasks (classification, segmentation, regression, and custom industrial models)
Different stored model formats and exports collected over time

In theory, any gradient-based method should “just work” on top of these models.
In practice, once we started running them, things got messy. Sometimes we got blank or obviously wrong heatmaps with no warning. Other times it failed loudly with:

RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

The forward pass was correct, the predictions made sense — but the value we were backpropagating from was no longer connected to the gradient graph.

That’s when it became clear: the main problem wasn’t any specific XAI algorithm, but the combination of model formats, conversions, and gradient flow.
In other words, not every model file we could load was actually explainable or measurable.

What an XAI-Ready Model Actually Needs

Very quickly, here’s the minimum a model needs in order to produce meaningful, measurable gradient-based explanations:

Gradients that actually exist When we choose a scalar score (for example, a class logit) and call backward(), the gradient must flow back to what we care about – either the input image or some internal layer. If that path is broken, the explainer can still return a heatmap-shaped tensor, but it’s not telling us anything real.
A score that stays inside the graph The value we backpropagate from has to be a tensor that’s still part of the computation graph. If it was turned into a Python number, passed through NumPy, or detached along the way, we’ve already lost the information XAI needs.
Access to internal features for CAM-style methods For methods like Grad-CAM, we also need a way to read activations and gradients from a chosen internal layer – but that comes after the basic gradient path is in place.

This post is about how to work with real-world models without breaking these requirements.

How Common Formats Behaved in Our Platform

Once we knew what an XAI-ready model needs, we looked at what we actually had: ONNX exports, TorchScript files, and some legacy TensorFlow models. All of them were fine for inference. For gradient-based XAI, the picture was very different.

This complexity really shows up when you’re not just loading your own training code, but building a platform that has to accept unknown model architectures from different teams.
If you control the architecture, you can usually rebuild a clean eager model and just load a state_dict. If all you get is a stored artifact (ONNX, TorchScript, legacy TF graph), then the format itself decides how much structure and gradient information you still have. That’s exactly the situation we were in.

ONNX – Great for Inference, Not for Gradients

We had models already deployed as ONNX. It was tempting to reuse them for Grad-CAM and Integrated Gradients. In practice, ONNX runtimes are optimised for forward passes, not for autograd:

You get fast, correct predictions.
You don’t get a PyTorch/TF-style gradient graph or easy hooks into internal layers.

So ONNX became our conclusion: perfect for deployment, but not a reliable source for gradient-based explanations or metrics. For XAI, we need the original framework model, not just the ONNX file.

TorchScript – fine for simple gradients, fragile for CAM-style methods

In our experience, TorchScript models do support gradients: if the export wasn’t heavily frozen or over-optimised, we can reliably backpropagate from a scalar score to the input and obtain meaningful gradient-based heatmaps.

The problems appear when CAM-style methods require access to internal convolutional features. Some TorchScript exports fuse layers, inline modules, or alter module boundaries, so the convolutional blocks we want to hook are no longer explicitly exposed. In these cases, forward and backward hooks become fragile, and optimisation steps can effectively make internal activations inaccessible even though gradients still exist.

Because of this, we treat TorchScript as acceptable for gradient-w.r.t-input methods, but for CAM-style explainers we require the original eager nn.Module, where internal layers remain cleanly and reliably accessible.

Native PyTorch / TF – Our XAI-Ready Baseline

After this, we decided
that for explainability the “ground truth” formats are:

PyTorch nn.Module in eager mode
TF2/Keras models or SavedModels that work cleanly with GradientTape

All other artifacts (ONNX, unknown TorchScript, legacy TF graphs) are welcome for inference, but we don’t assume they are explainable until proven otherwise.

We also learned that simply “converting back” from a non-differentiation-friendly format doesn’t magically fix things.
You can end up with a PyTorch nn.Module or a TF2 SavedModel that looks clean, but was rebuilt from ONNX or an old TF1 graph using a script full of .numpy() calls and manual tensor operations.
On paper the format is now “good”, but the gradient path is still broken.
For a deeper dive into how we converted legacy models without breaking gradients, see the companion post.

Model format	Gradient support	CAM compatibility	Recommended usage
ONNX	❌ No autograd graph	❌ Not supported	Inference / deployment only
TorchScript	✅ Full (in correct loading)	⚠️ Fragile (layers may be fused or hidden)	Simple gradient methods when eager model is unavailable
Native PyTorch (eager)	✅ Full	✅ Full	Gradient-based XAI and quantitative metrics
Native TF2 / Keras	✅ Full (`GradientTape`)	✅ Full	Gradient-based XAI and quantitative metrics
Converted models (from ONNX / TF1)	⚠️ Depends on conversion	⚠️ Works if gradients are supported	Treat as inference-only unless gradients are explicitly verified

Don’t Break Gradients in Your Own Code

Even with an XAI-ready format, it’s still easy to kill gradients in the parts we control: preprocessing, adapters, and forward passes. We saw a few recurring “self-inflicted” problems:

Wrapping the whole explanation call in torch.no_grad()
Calling .detach() on tensors that are still needed for the XAI score
Converting tensors to NumPy (.cpu().numpy()) and then using those values to compute the score
Using .item() on logits and doing the rest of the logic in pure Python

All of these are harmless for plain inference, but they quietly break the gradient path that explanations rely on.
Our rule of thumb became:

The main path from input → model → XAI score must stay inside the framework’s autograd system.

If we really need to log or serialize something, we do it after we’ve computed the score we’ll backpropagate from.

Bonus: A Tiny Sanity Check for New Models/Adapters

Whenever we plug a new model or adapter into the platform, we run a quick check:

Load the model in an XAI-ready format (eager PyTorch / TF2).
Pick a simple score (for example, one class logit).
Call backward.
Verify that gradients at the input (or at the adapter boundary) are non-zero.

In PyTorch, this is just a few lines of code. If this fails with does not have a grad_fn or always gives zero gradients, we usually don’t look at the explainer first – we look at the model format or the forward path we’ve built around it.

import torch

model = torch.load("model.pt", map_location = "cpu", weights_only = False)
model.eval()

x = sample_image.to("cpu") 
x.requires_grad_(True)

logits = model(x)
score = logits[:, target_class].mean()

model.zero_grad()
score.backward()

print("Grad norm on input:", x.grad.norm().item())

If this prints a reasonable, non-zero gradient norm, the model is at least technically explainable for gradient-w.r.t-input methods.

In practice, this became our first filter: If the model is fully differentiable, we keep going with gradient-based explanations and metrics.
If not, we still allow inference – but we deliberately treat that model as inference-only, not explainable.