Forem: sentinel-safety

Inside SENTINEL: How 13 Microservices Detect Child Grooming by Behavior, Not Keywords

sentinel-safety — Sat, 25 Apr 2026 16:02:22 +0000

This is a technical walkthrough of SENTINEL's architecture. If you want to understand how a behavioral child safety detection system actually works at the service level, this is for you.

SENTINEL is a 13-microservice platform. Each service is independently deployable. You can start with just the event ingestion and risk scoring services, add the compliance layer when needed, and opt into federation later. Here's what each service does and why it exists as a separate service.

Why microservices?

Content moderation systems get bolted into platform infrastructure and then never changed. A monolithic design locks you into the same detection logic, the same compliance reporting format, and the same infrastructure footprint — even as your platform scales and your regulatory obligations evolve.

SENTINEL's services are small, replaceable, and independently testable. A platform that wants to swap SENTINEL's linguistic model for their own detection model can do that without touching the audit log service or the NCMEC reporting pipeline. A platform that doesn't need the federation service doesn't deploy it.

The 13 services group into four layers: ingestion, analysis, infrastructure, and output.

Ingestion layer

Event API Service is the single entry point. Platforms send behavioral events over REST: message sent, session started, relationship formed, contact frequency change. The service validates the schema, assigns a platform-specific event ID, and queues the event for the analysis layer. Webhook callbacks are supported for real-time risk score delivery.

SDK layer is not a service itself, but the Python and Node.js SDKs abstract the API call. Most platforms integrate at the SDK level, not the raw API. The SDKs handle batching, retry logic, and async callback handling.

Analysis layer

These four services are the core of SENTINEL's behavioral detection. Each is independently scalable.

Linguistic Analysis Service builds a session-by-session profile of how a user's communication style changes over time. It is not a keyword scanner. It watches register shifts — vocabulary level, formality, pronoun use, topic focus — and compares them against session history to detect the style changes associated with manufactured intimacy. The model runs on behavioral metadata about language; it does not read or store message content in the traditional sense.

Graph Analysis Service maintains a social graph for each platform: who communicates with whom, at what frequency, and through which channel types. It detects coordinated targeting (multiple accounts approaching the same minor), asymmetric relationship formation (high contact frequency on one side), and escalation from group channels to private channels. Graph signals are some of the most reliable indicators of grooming intent — they are hard to game because they reflect structural behavior, not surface-level content choices.

Temporal Analysis Service watches time-domain signals: contact frequency acceleration, unusual-hours patterns, cross-session escalation velocity. A user who contacts a minor three times in week one, eight times in week two, and daily by week three is exhibiting a velocity pattern. The temporal service tracks this trajectory across sessions and integrates with the risk scoring aggregator to weight recent escalation more heavily.

Fairness Evaluation Service does not produce risk scores. It runs before any detection model deploys and computes demographic parity metrics across the user population. If the linguistic, graph, or temporal models produce false positive rates that differ significantly across demographic groups, this service blocks deployment. Once deployed, it runs periodic re-evaluation to catch drift.

Risk scoring layer

Risk Score Aggregator takes outputs from the linguistic, graph, and temporal services and combines them into a unified risk score between 0 and 100. The combination is not a simple average: each signal layer is independently weighted, and the aggregation logic is configurable per platform. The aggregator also produces the plain-language explanation that accompanies each score — synthesizing the specific signals that contributed, in a format that a human moderator can read and a court can understand.

The risk score aggregator assigns a tier label: trusted (0–29), watch (30–59), restrict (60–84), and critical (85–100). These thresholds are configurable.

Infrastructure layer

Audit Log Service maintains SENTINEL's tamper-evident audit chain. Every risk score, every model deployment decision, every fairness evaluation, and every compliance export is written to a cryptographically chained log. Records cannot be altered without detection. Retention is seven years by default, configurable for jurisdictions requiring longer retention. This is the primary documentation artifact for regulatory audit requests.

Federation Service manages the opt-in cross-platform threat intelligence network. When a platform confirms a grooming case (human-reviewed), the federation service generates a behavioral signature — a non-reversible vector representation of the behavioral pattern — and submits it to the federation pool. When analyzing new users, the service queries whether their behavioral profile matches any known signature. No user PII or message content crosses platform boundaries.

Data Retention and Erasure Service handles GDPR Article 17 erasure requests, COPPA deletion requirements, and jurisdiction-aware data retention policies. When a user deletion request arrives, this service coordinates with the other services to remove personal data while preserving the audit log integrity required for compliance. The audit log entries are pseudonymized rather than deleted, maintaining the evidentiary chain while honoring erasure obligations.

Output layer

NCMEC Reporting Service assembles CyberTipline evidence packages when behavioral indicators meet mandatory reporting thresholds. The package includes the structured event timeline, risk score history, platform context, and whatever user metadata is required for the report. Platform operators review and file; SENTINEL prepares the documentation. This service integrates with the audit log to ensure the evidence package and the audit record are consistent.

Moderation Dashboard Service presents the moderation queue to platform trust and safety teams. Flagged users appear with their risk scores, tier labels, and plain-language explanations. Moderators can review the behavioral signal history, take action, and record the outcome. The service feeds outcomes back into the audit log.

Compliance Export Service generates structured documentation for regulatory submissions: risk assessment records for DSA Article 28 compliance, transparency reports, and audit extracts. These are exportable in machine-readable formats compatible with the EU's Digital Services Act transparency database requirements.

How services communicate

Within a SENTINEL deployment, services communicate over a message queue (Redis by default) for asynchronous analysis jobs and over REST for synchronous queries. The event API places analysis jobs on the queue; each analysis service processes them and writes results to PostgreSQL. The risk score aggregator subscribes to completed analysis outputs and triggers score generation.

Federation queries are synchronous REST calls to the federation service (with caching for high-frequency platforms). Audit log writes are append-only over a dedicated internal API.

Starting small

You do not need to deploy all 13 services. The minimum viable deployment is the Event API, the three analysis services (linguistic, graph, temporal), and the risk score aggregator. This gives you behavioral risk scoring with plain-language explanations.

Add the audit log service for compliance infrastructure. Add the NCMEC reporting service when mandatory reporting becomes relevant. Add the federation service when your platform is large enough to benefit from cross-platform threat intelligence.

The Docker Compose configuration in the repository defines the full stack. Individual services can be commented out for minimal deployments.

SENTINEL is open source. Every service's code, model training scripts, and data handling policy is in the repository.

GitHub: https://github.com/sentinel-safety/SENTINEL

Free for platforms under $100k annual revenue and all non-commercial and research use.

Fairness in Child Safety AI: Why Demographic Parity Audits Are Not Optional

sentinel-safety — Sat, 25 Apr 2026 16:00:50 +0000

Most machine learning systems for content moderation are built, evaluated on accuracy metrics, and deployed. Fairness evaluation is treated as a nice-to-have, or skipped entirely.

In child safety specifically, this is a serious problem — and not just for ethical reasons. Systems that flag one demographic group disproportionately cause real harm to the falsely flagged users, create legal exposure for the platform, and undermine public trust in automated moderation. They also tend to miss threats in underrepresented groups.

SENTINEL treats fairness differently: demographic parity is a hard deployment constraint. No model ships if it fails. This post explains why, and how it works.

The specific failure mode

Content moderation datasets are biased. This is almost universally true, for several converging reasons:

Historical reports are not uniformly distributed. Platforms receive more reports from users who are most engaged with reporting tools, which skews toward certain demographics. Communities that distrust platforms report less. Communities that have historically been moderated more heavily are more represented in training labels.

Language patterns differ by demographics. Models trained to detect linguistic patterns associated with grooming may learn correlates that happen to be more common in speech patterns associated with certain ethnic, regional, or age groups — completely independent of actual risk.

Sampling bias in synthetic datasets. When real data is unavailable and researchers generate synthetic grooming datasets for training, the synthetic data reflects the assumptions of whoever wrote it.

The result: a model trained on historical moderation data may produce substantially different false positive rates across demographic groups. Applied to a production platform, this means some user populations are flagged at rates 2x, 3x, or higher than others — with no actual difference in risk.

Why this matters specifically for child safety

In most content moderation contexts, a false positive means an innocuous post is removed or a legitimate user is temporarily suspended. That's bad, but recoverable.

In child safety moderation, the stakes are higher on both sides. A false positive doesn't just inconvenience a user — it potentially exposes a minor to a flagged interaction, can result in account termination, and may even trigger law enforcement contact. The reputational, legal, and personal consequences of being incorrectly flagged as a potential predator are severe.

This creates a specific obligation: child safety AI needs to be demonstrably fair across demographic groups, not just accurate overall.

Regulators are arriving at the same conclusion. The EU DSA's algorithmic accountability provisions (Articles 34-35) include requirements to assess systemic risks that arise from the design of automated systems, including risks related to fundamental rights. A system that disproportionately flags users from minority groups creates exactly this kind of systemic risk.

Demographic parity as a deployment gate

Most AI fairness work happens after deployment: models are built, deployed, and then audited to see if they've produced disparate impact. By then, the harm is already in production.

SENTINEL takes a different approach: the fairness audit runs before deployment, and passing it is required.

Specifically, before any detection model is deployed on a tenant platform, SENTINEL runs a demographic parity evaluation across the platform's user population. The evaluation measures the false positive rate across demographic groups (age, gender, and any additional demographic signals available from the platform's user data).

If the false positive rate differs across groups by more than a configurable threshold (default: 10 percentage points), deployment is blocked. The model is not gradual-rollout'd, not deployed with a warning, not deployed with a note in the audit log. It cannot ship.

The platform receives a fairness report explaining which demographic segment has the elevated false positive rate, the magnitude of the disparity, and recommendations for retraining or re-weighting the model.

Why a gate, not a dashboard

A common question: why not just show a fairness dashboard and let the platform decide?

Three reasons:

First, the decision should not be delegated to individual platform operators. A platform under regulatory scrutiny may face strong pressure to deploy quickly. A compliance gate removes the pressure. The system enforces the standard regardless of business timelines.

Second, fairness metrics are not intuitive, and disparate impact is easy to rationalize. "Our overall accuracy is 94% and the disparity is only 8 percentage points" sounds reasonable until you recognize that an 8-point disparity in false positive rate means one user group is being incorrectly flagged at roughly double the rate of another. A gate makes the threshold explicit and enforceable.

Third, regulator expectations are moving toward architectural enforcement. The EU DSA and UK Online Safety Act both require risk mitigation measures, not just risk assessment. A deployment gate provides a documentable, auditable enforcement mechanism that a risk assessment dashboard does not.

Technical implementation

The fairness gate in SENTINEL works in three stages:

Pre-deployment evaluation: When a tenant installs a new detection model (or updates an existing one), SENTINEL runs the model against a balanced evaluation set drawn from the platform's historical behavioral data. The evaluation set is stratified by demographic group to ensure sufficient representation of each group for meaningful statistical comparison.

Disparity measurement: The gate computes false positive rate for each demographic group and computes the maximum pairwise disparity. It also computes the false negative rate (missed true positives) across groups, since fairness cuts both ways: a model that misses threats in one demographic group while detecting them in others fails fairness criteria as well.

Pass/fail determination: If the maximum pairwise disparity in false positive rate or false negative rate exceeds the configured threshold, the model is marked as failed and cannot be deployed. The gate produces a detailed report: which groups were compared, what the measured rates were, and how far the model fell outside the threshold.

What happens when a model fails

When a model fails the fairness gate, the platform receives a report and works with the model to bring it into compliance. The most common interventions are:

Reweighting the training data to correct for underrepresentation of particular groups.

Calibration adjustments to reduce systematic score inflation for specific groups.

Feature engineering: if specific features are driving disparate impact, those features may need to be removed or replaced.

In some cases, the training dataset is simply inadequate for producing a fair model, and the model needs to be retrained with better data. The fairness gate catches this before it becomes a production problem.

The fairness-accuracy tradeoff

A frequent objection: doesn't imposing fairness constraints reduce overall accuracy?

In practice, for behavioral detection specifically: models that produce disparate impact are usually not more accurate overall — they're reflecting bias in the training data. Correcting for that bias tends to improve calibration across the board.

There is a theoretical tradeoff: in some scenarios, constrained optimization for fairness does reduce the optimized accuracy metric. SENTINEL's position is that this tradeoff is acceptable and, in the child safety context, required. A system with 93% accuracy and equitable false positive rates is better than a system with 95% accuracy that disproportionately flags one demographic group.

The regulatory and ethical case for accepting this tradeoff is strong. The legal case is becoming clearer as enforcement under DSA and OSA develops.

Connecting to audit infrastructure

The fairness gate doesn't operate in isolation. Every fairness evaluation run is logged in SENTINEL's tamper-evident audit log, including the model version, the evaluation dataset, the demographic groups evaluated, the measured disparity rates, and the pass/fail outcome.

This creates an auditable record that the platform took fairness evaluation seriously. When a regulator asks how the platform ensured its automated systems did not produce disparate impact, this log is the answer.

The fairness gate is part of SENTINEL's core platform. It applies to all detection models on all tenant platforms, with no opt-out.

SENTINEL is an open-source behavioral intelligence platform for child safety compliance. Free for platforms under $100k annual revenue.

GitHub: https://github.com/sentinel-safety/SENTINEL

What EU DSA and UK Online Safety Act require from your platform's child safety infrastructure

sentinel-safety — Sat, 25 Apr 2026 15:56:57 +0000

Building a platform where kids might be present? The regulatory landscape changed substantially in 2024 and 2025, and the compliance obligations are more specific than many developers realize.

This is a practical breakdown of what the EU Digital Services Act and UK Online Safety Act actually require at the technical level, and what compliant infrastructure looks like.

Are you in scope?

The EU Digital Services Act's child safety obligations (Article 28) apply to any online platform accessible to minors in the EU. "Accessible to minors" is the operative phrase: if children can access your service, you are in scope. You do not have to specifically market to children. The DSA came into full application in February 2024.

The UK Online Safety Act takes a similar approach: services "likely to be accessed by children" in the UK fall under child safety duties. Ofcom is publishing a categorization register in mid-2026 that will explicitly list which services are in scope.

The practical implication: any platform with social features, chat, or user-generated content that children might encounter is likely subject to at least some of these obligations. The "we're too small to worry about it" era is over.

What "proactive safety" actually means

Both the DSA and UK OSA require proactive rather than reactive child safety measures. This is a meaningful distinction.

Reactive safety means: a child reports something harmful, the platform reviews it and takes action. This is the baseline that most platforms operate at today.

Proactive safety means: the platform has systems in place to identify and intervene before harm occurs, based on risk assessment and systematic monitoring.

Specifically, Article 28 of the DSA requires platforms to assess systemic risks to minors and implement mitigation measures. The UK OSA requires services to be "safe by design," with proactive systems rather than purely response-based moderation.

Keyword filters, even sophisticated ones, are primarily reactive. Predators have adapted to them. They avoid flagged terms, use coded language, and spend weeks or months establishing trust before anything overtly harmful appears in message content. By the time a keyword filter triggers, the grooming process has often already advanced significantly.

What satisfies proactive requirements is behavioral monitoring: watching how interactions evolve over time, identifying escalation patterns early, and surfacing risk before explicit content appears.

The audit trail requirement

Both regulations require platforms to demonstrate compliance, which means documentation and audit trails are mandatory, not optional.

DSA Article 28 requires platforms to produce documentation of their risk assessments and mitigation measures. Regulators can demand this evidence. The record-keeping obligation extends across multiple years.

The UK Online Safety Act requires similar audit readiness. Ofcom has enforcement powers including substantial fines, and audit evidence demonstrating proactive safety measures is central to establishing compliance.

For legal proceedings involving child exploitation or grooming, courts and law enforcement also require documentation: who was flagged, what behavioral evidence supported the flag, what action was taken, and when. This documentation needs to be tamper-evident, meaning the platform cannot alter records after the fact without detection.

Cryptographically chained audit logs, retained for at least seven years, satisfy both the regulatory audit requirements and the legal evidence standards.

The mandatory reporting infrastructure

Platforms operating in the US have mandatory reporting obligations under 18 U.S.C. § 2258A: if a platform becomes aware of apparent child sexual exploitation material, it must report to the National Center for Missing and Exploited Children (NCMEC) CyberTipline. Failure to report is a criminal offense.

The NCMEC reporting process requires specific documentation: user information, timestamps, platform context, and the flagged content. Generating these evidence packages manually is error-prone and slow. Compliance infrastructure should automate this documentation so that when a platform files a report, the evidence package is ready.

The GDPR and COPPA intersection

Platforms serving users across jurisdictions face an intersection problem. COPPA (US) applies to platforms collecting personal information from children under 13. GDPR (EU) applies to personal data of EU residents, with heightened protections for children's data. The UK post-Brexit equivalent maintains similar protections.

These frameworks have different requirements around data retention, parental consent, and erasure. A platform operating internationally needs to satisfy all of them simultaneously. The infrastructure for this includes jurisdiction-aware data retention policies, automated erasure workflows for deletion requests, parental consent mechanisms and records, and separation of data handling for minors versus adult users.

What compliant infrastructure actually needs

Pulling this together, a platform taking its child safety compliance obligations seriously needs:

Proactive behavioral detection that identifies escalation patterns before explicit harm occurs
Tamper-evident audit logs retained for at least seven years, cryptographically chained so records cannot be altered
Risk assessment documentation recording what the platform assessed and what mitigations were implemented
NCMEC CyberTipline evidence packages generated automatically when reportable content is identified
Jurisdiction-aware data handling covering GDPR, COPPA, and UK data protection requirements
Explainable moderation decisions so human moderators and regulators can understand why a user was flagged

The compliance gap

This infrastructure has historically been expensive to build and only accessible to large platforms. GDPR compliance consultants, behavioral detection systems, and audit infrastructure are not cheap.

This creates a genuine problem: the largest platforms have dedicated trust and safety teams and reasonable compliance budgets. Smaller platforms, often the ones children encounter in gaming, social, and creative communities, have almost nothing.

The DSA and UK OSA apply to smaller platforms too. The July 2026 Ofcom categorization register and continued DSA enforcement will make this increasingly difficult to ignore.

SENTINEL

We built SENTINEL as an open-source reference implementation for exactly this compliance stack. It ships with behavioral detection across four signal types (linguistic, graph, temporal, fairness), a demographic parity enforcement gate that blocks deployment if the detection model disproportionately flags any group, tamper-evident cryptographically chained audit logs with seven-year default retention, automated NCMEC CyberTipline evidence package generation, and jurisdiction-aware GDPR and COPPA data handling.

Every risk score comes with a plain-language explanation of the specific behavioral signals that triggered it, so moderators and regulators can understand and document the decision.

SENTINEL is free for platforms under $100k annual revenue and all non-commercial and research use. Fully open source.

GitHub: https://github.com/sentinel-safety/SENTINEL

Grooming operates over time. Here's how behavioral detection tracks it.

sentinel-safety — Sat, 25 Apr 2026 15:51:41 +0000

Every system designed to detect child grooming has the same problem: it's looking at the wrong unit of analysis.

Grooming doesn't happen in a message. It happens across weeks of messages — a slow accumulation of trust, a gradual shift in conversational register, an escalation in contact frequency that would look unremarkable if you sampled any individual session but reads clearly as a pattern when you step back and look at the whole trajectory.

When you build a detection system around message-level classification, you're designing for a problem that doesn't exist. Predators don't send a message that contains the whole grooming attempt. They send a hundred messages across a month, each one just slightly further than the last.

This post is about how temporal signal analysis changes the problem — and specifically, how SENTINEL's temporal layer works.

What keyword filters see

A keyword filter has a view like this:

[message] → [classifier] → flag / no flag

Each message is independent. The system has no memory. What happened in last Tuesday's session doesn't affect how it evaluates today's message.

This maps cleanly onto spam detection, where the signals that make a message spam are usually present in the message itself. It maps badly onto grooming, where the signal is the shape of behavior over time, not the content of individual messages.

A systematic review of the grooming detection literature by An et al. (arXiv:2503.05727, 2025) found that behavioral and temporal features are "consistently underexplored relative to linguistic features across the published literature" despite showing strong discriminative power in the studies that do use them. The architecture of most detection systems — trained on datasets of individual message excerpts — has driven the field toward a unit of analysis that the problem doesn't support.

What the behavioral evidence actually shows

Research on documented grooming cases consistently identifies a set of behavioral patterns that operate across sessions rather than within them:

Escalation velocity. Grooming tends to follow a measurable escalation trajectory: initial low-stakes contact, relationship development, increasing intimacy and exclusivity, then requests for personal information, image sharing, or off-platform contact. The rate at which this escalation moves is a signal. Fast escalation from a new contact is a very different pattern from a years-long friendship.

Contact frequency evolution. Early in grooming, contact is typically sporadic and positioned as casual. As trust develops, contact frequency increases and becomes more purposeful. The shift from irregular to regular to daily to multiple-times-daily contact, across sessions rather than within a single session, is a behavioral signature.

Session-bridging behavior. Predators often end sessions in ways that create continuity with the next one — leaving threads open, referencing the next time they'll talk, creating a sense of ongoing relationship rather than discrete conversations. This cross-session threading is observable as a temporal pattern.

Off-platform migration attempts. Requests to move a conversation from a platform to a private channel (WhatsApp, Signal, Snapchat) tend to cluster at a specific point in the grooming trajectory, after sufficient trust has been established but before the predator feels confident enough to escalate overtly on the monitored platform. The timing of this request, relative to the arc of the relationship, is a signal.

None of these patterns are visible in a message. They're only visible as trajectories.

How SENTINEL's temporal layer works

SENTINEL analyzes user behavior across four signal layers: linguistic, graph, temporal, and fairness. The temporal layer is specifically designed to capture the escalation patterns that cross-session behavioral analysis makes visible.

The core object is what we call the behavioral profile: a rolling window of signals accumulated across sessions for a given user-to-user relationship or a given user's behavior on the platform. This profile is updated with each new event and used to compute temporal features.

The key temporal signals SENTINEL tracks:

Escalation velocity. The rate at which the composite behavioral risk score is increasing over time. A user whose score has risen from 15 to 60 over three weeks looks very different from a user whose score reached 60 in a single session. The trajectory itself carries information.

Contact frequency gradient. How the rate of contact between two users has changed over time. The first week of contact looked casual; by week four, there are multiple sessions per day. The gradient of this change is computed as a temporal signal.

Session boundary behavior. How sessions end and begin. Does the conversation pick up immediately where it left off? Are there explicit continuity markers? Does the session-ending message create an open loop that the next session closes?

Time-of-day pattern shifts. Contact shifting to unusual hours — late night, early morning — is a known escalation marker. SENTINEL tracks whether the distribution of contact times has changed over the observation window.

These signals are composited into a temporal risk contribution that's added to the overall behavioral risk score alongside the linguistic and graph signal contributions.

The practical implication: why trajectory matters more than threshold

The classic approach to classification systems is to set a threshold: if the confidence score exceeds X, flag the content. For message-level classifiers, this makes sense — the score reflects confidence in the single message being malicious.

For temporal systems, the threshold intuition breaks down. The point is not whether today's message exceeds a threshold, but whether the shape of behavior over time matches known grooming trajectories.

SENTINEL scores users rather than messages. The risk score for a user reflects the accumulated weight of behavioral evidence across their entire history on the platform, with decay applied to older signals so that low-risk periods can recover a user's standing. A single suspicious message raises the score modestly. A sustained pattern of escalating contact, register shifts, and frequency increases over three weeks raises it substantially.

This means a moderator review queue populated by SENTINEL's scores looks different from a queue populated by per-message classification scores. The cases at the top of the queue are there because of a behavioral trajectory — because something has been building, not because one message happened to cross a threshold.

What explainability looks like for temporal signals

One of SENTINEL's design requirements is that every risk score comes with a structured plain-language explanation of the signals that contributed to it. For temporal signals, this looks like:

"Contact frequency between this user and [target] has increased 4.2x over the past 21 days. Time-of-day distribution has shifted toward late evening hours. Risk score increased 18 points this week driven primarily by contact frequency escalation."

This explanation structure matters for two reasons.

For human moderators: reviewing a case with this context is fundamentally different from reviewing a number. The moderator understands why the system flagged this user, can evaluate whether the behavioral trajectory matches their knowledge of the specific situation, and can make a better decision about what action, if any, is warranted.

For legal defensibility: if a moderation action is challenged — or if the platform needs to document its proactive detection methodology for DSA or UK Online Safety Act audit purposes — a structured explanation of the behavioral trajectory is far more useful than a classifier confidence score.

The data problem

The honest limitation of temporal detection is that it requires time-series data, which creates challenges that message-level systems don't face.

Most academic grooming detection datasets are collections of chat logs — often the PAN12 benchmark dataset — without full temporal context. Training and evaluating temporal detection systems requires longitudinal data: the full arc of a relationship over time, with session boundaries preserved. This data is scarce in research settings, because it requires either real platform data (which raises obvious consent and ethical issues) or synthetic data generation with careful attention to temporal realism.

SENTINEL ships with a synthetic research dataset of 50 annotated grooming conversations, designed for temporal analysis. It's a starting point; extending it is an explicit project goal. Academic researchers who want to build on this or contribute temporal datasets are specifically invited to engage — reach out at sentinel.childsafety@gmail.com.

Where this leaves detection systems

The practical implication is that building effective grooming detection requires choosing a unit of analysis that matches the phenomenon: not the message, not even the session, but the behavioral trajectory across sessions over time.

Systems that operate at the message level will always face the fundamental evasion problem: a predator who knows not to send any individual message that crosses a threshold can groom successfully while generating only normal-looking messages at each individual checkpoint. Systems that track behavioral trajectories can detect the escalation pattern even when no individual message is above threshold.

This is why SENTINEL's architecture is built around behavioral profiling rather than per-message classification — and why the temporal layer is central to the detection model rather than an add-on.

SENTINEL is open source and free for platforms under $100k annual revenue: https://github.com/sentinel-safety/SENTINEL

For questions, dataset contributions, or research collaboration: sentinel.childsafety@gmail.com

Inside SENTINEL: How 13 Microservices Detect Child Grooming by Behavior, Not Keywords

sentinel-safety — Sat, 25 Apr 2026 15:06:05 +0000

Keyword filters are a solved problem — solved by predators. They learned years ago to spell things differently, avoid flagged words, and simply groom slowly enough that no single message triggers a filter. The result: every major platform relying solely on keyword detection is running safety infrastructure that the most dangerous users have already mapped and bypassed.

SENTINEL takes a different approach. Instead of asking "does this message contain a bad word?", it asks "does this person's behavior, over time, resemble the trajectory of a predator approaching a minor?"

This post covers how that works at an engineering level.

The Four Signal Layers

SENTINEL's risk scoring is built on four independent signal layers feeding into a weighted ensemble:

1. Linguistic Analysis

NLP signals beyond keyword matching: sentiment trajectory across a conversation, escalation in intimacy markers, attempts to isolate the target from other users, and lexical similarity to known grooming conversation patterns. Models are trained on synthetic and research-derived datasets — never real user data.

2. Graph Analysis

Who is talking to whom, at what frequency, and with what structural characteristics. A 40-year-old account with zero peer-age connections making rapid friend requests to accounts flagged as likely minors looks very different from an 18-year-old talking to their gaming friends. Graph signals detect coordinated targeting, unusual relationship formation rates, and network centrality anomalies.

3. Temporal Analysis

Grooming has a temporal signature. Conversation escalation follows recognizable progressions. Contact frequency patterns — how often someone messages a specific user, at what times, with what regularity — are informative signals independent of content. SENTINEL builds time-series models of behavioral escalation across sessions.

4. Fairness Audit Layer

Before any composite score is emitted, it passes through demographic parity checks. If the system would flag members of one demographic group at a materially different rate than another for identical behavior, the score is held until the discrepancy is resolved. This is enforced at runtime, not just during training.

The four layers produce a composite score from 0–100 with four tiers: trusted, watch, restrict, critical.

The 13 Microservices

SENTINEL ships as a Docker Compose stack of 13 independent services. Each can be deployed incrementally — you do not need the full stack to get value.

Core Pipeline

1. event-ingestor — The entry point. Accepts raw events (messages, relationship changes, login events) via REST API or webhook. Normalizes, validates, and routes to the internal queue. Handles 10k+ events/second per instance.

2. nlp-scorer — Consumes events from the queue. Runs the linguistic analysis pipeline: tokenization, entity extraction, sentiment analysis, escalation detection. Emits linguistic signal scores to the aggregator.

3. graph-builder — Maintains the relationship graph in a vector database. On each new relationship event, updates edge weights, recalculates centrality, and flags anomalous graph formation. Uses incremental graph algorithms to avoid full recomputation.

4. temporal-tracker — Maintains per-user time-series of behavioral events. Computes rate-of-change signals, session frequency patterns, and contact escalation curves.

5. risk-aggregator — The ensemble. Pulls scores from the three signal services, applies the weighted ensemble model, runs the fairness gate, and writes the final risk score to the score store.

6. score-store — PostgreSQL-backed store for all risk scores with full history. Every score change is recorded with the contributing signals and their weights. The record contains not just "the score is 74" but which six signals contributed how much at what timestamp.

Compliance and Audit

7. audit-chain — Every moderator action, every automated action, every score change produces a cryptographically signed audit event. Events are chained (each includes the hash of the previous), making retroactive tampering detectable. Retained for 7 years, designed to serve as legal evidence.

8. compliance-engine — Per-tenant regulatory configuration. Handles GDPR right-to-erasure (soft-deletes with zero-knowledge proof of deletion), COPPA data retention limits, DSA reporting endpoint generation, and OSA audit export formatting.

9. alert-dispatcher — Watches the score store for threshold crossings. On critical tier transitions, fires webhook callbacks, generates moderator queue entries, and (if configured) prepares NCMEC CyberTipline-formatted evidence packages.

Federation Layer

10. federation-gateway — The privacy-preserving threat intelligence layer. When a user reaches critical tier, a cryptographic signal (not identifying data, not message content) is shared with opted-in peer platforms. Peers receive a risk signal for a pseudonymous identifier and can check for a matching user in their own system.

11. identity-resolver — Maps between external platform identifiers and SENTINEL's internal pseudonymous IDs. Raw platform user IDs never appear in logs, federation signals, or audit exports.

Developer Interface

12. api-gateway — The external-facing REST API. Handles authentication, rate limiting, per-tenant routing, and SDK compatibility. The Python and Node.js SDKs talk exclusively to this service.

13. dashboard-service — The moderator web UI. Displays risk score queues, behavioral timelines, graph visualizations, and the human review workflow. Every score comes with a plain-language explanation of why, specifically to reduce moderator burnout from opaque black-box outputs.

How the Fairness Gate Works

Before any risk score leaves the risk-aggregator, it runs through the fairness gate:

def fairness_gate(score, signals, demographic_proxy):
    baseline_rate = get_population_flag_rate(demographic_proxy)
    predicted_rate = estimate_flag_rate(score, signals, demographic_proxy)

    disparity = abs(predicted_rate - baseline_rate) / baseline_rate

    if disparity > PARITY_THRESHOLD:
        raise FairnessViolation(
            f"Demographic parity violation: {disparity:.2%} disparity detected"
        )

    return score

The threshold is configurable per deployment. When a FairnessViolation is raised, the score is quarantined and flagged for human review rather than propagated downstream. This is not a soft warning — it is a hard stop.

The default threshold (5% disparity) is derived from NIST's AI Risk Management Framework recommendations.

The Federation Protocol

The federation protocol is the most architecturally interesting piece. The goal: share threat intelligence across platforms without sharing any of the data that makes that intelligence sensitive.

The flow:

Platform A detects a critical-tier user. The federation-gateway generates a hashed, salted pseudonymous token from the user's behavioral signals.
The token is broadcast to opted-in peers via a gossip protocol over mutual TLS.
Platform B receives the token. Its identity-resolver checks whether any of its users produce a matching token under the shared salt.
If a match is found, Platform B's risk-aggregator applies a federation risk boost to that user's score.

No messages are shared. No usernames. No IPs. Platform A never learns which users on Platform B were matched. A predator banned on one platform gets flagged on another within minutes, with zero raw data crossing platform boundaries.

This is v1 of the federation protocol. The roadmap includes k-anonymity enhancements and a formal differential privacy layer.

Integration

The entire integration surface is the event ingestor API:

from sentinel_safety import SentinelClient
import hashlib

client = SentinelClient(api_key="your_key", tenant_id="your_tenant")

# Send a message event
client.ingest_event({
    "event_type": "message",
    "sender_id": "user_abc",
    "recipient_id": "user_xyz",
    "platform_room_id": "room_123",
    "timestamp": "2026-04-25T12:00:00Z",
    # Content hash only — raw messages never leave your platform
    "content_hash": hashlib.sha256(message_content.encode()).hexdigest(),
})

# Get current risk score
score = client.get_risk_score("user_abc")
print(score.tier)       # "watch"
print(score.score)      # 47
print(score.reasoning)  # Plain-language explanation of contributing signals

Content is never sent to SENTINEL — only a hash, alongside behavioral metadata. NLP analysis runs client-side via the SDK; only extracted signal scores reach the ingestor. Raw messages never leave your platform.

Time to first integration: under an hour.

Tech Stack

Python 3.12, FastAPI for all internal services
PostgreSQL (score store, audit chain)
Redis (event queue, session state)
Qdrant (vector database for graph embeddings)
Docker Compose for local and self-hosted deployment
OpenTelemetry throughout for observability

No proprietary cloud services required. Deployable on any provider.

What Is Next

SENTINEL v1.0 is live: github.com/sentinel-safety/SENTINEL

The roadmap: federated learning enhancements (on-device model updates without data sharing), k-anonymity improvements to the federation protocol, expansion of the research dataset beyond the current v1 baseline, and formal academic publication of the behavioral detection methodology.

If you are building a platform where minors are present and have not yet implemented proactive safety measures, SENTINEL is designed so there is no excuse not to. Setup is a Docker Compose file and an API key. Compliance infrastructure is included. The audit trail is automatic.

Commercial licensing for platforms over $100k annual revenue: sentinel.childsafety@gmail.com

SENTINEL is built and maintained by the Sentinel Foundation. v1.0 released April 2026.

Fairness in Child Safety AI: Why Demographic Parity Audits Are Not Optional

sentinel-safety — Sat, 25 Apr 2026 12:36:48 +0000

There's a particular failure mode in content moderation AI that the industry doesn't talk about enough: the system works, on average, but it works badly for specific groups.

Keyword filters disproportionately flag African-American Vernacular English. Toxicity classifiers flag LGBTQ+ content at higher rates than equivalent heteronormative content. Spam detection penalizes non-native English speakers. These failures are documented, reproducible, and — when they happen in a child safety context — cause serious harm.

If your child safety detection system disproportionately flags minors from certain demographic groups as high-risk, you're not just making mistakes. You're making systematic mistakes that will expose specific communities to greater scrutiny, greater false suspicion, and potentially greater harm from over-moderation. At the same time, you may be under-flagging true positives in other demographic groups — leaving some children less protected.

This is why fairness enforcement in child safety AI is not optional. And it's why we built demographic parity audits as an architectural enforcement mechanism in SENTINEL — not a metric to monitor, but a gate that blocks deployment.

What Fairness Actually Means in Detection Systems

"Fairness" in ML has multiple mathematical definitions that are often in tension with each other. For a detection system, the most relevant concepts are:

Demographic parity (statistical parity): The system flags roughly equal proportions of each demographic group. If 5% of adult users overall are flagged as high-risk, demographic parity requires that roughly 5% of adult users from any given demographic group are also flagged.

Equal opportunity: The true positive rate is equal across groups. If the system correctly identifies 80% of genuine threats in one group, it should identify roughly 80% in all groups.

Equalized odds: Both true positive rate and false positive rate are equal across groups.

These three definitions often conflict. A system that achieves demographic parity may fail equal opportunity (if the base rate of actual threats differs across groups). A system optimized for equal opportunity may produce different false positive rates across groups.

For SENTINEL, we selected demographic parity as the primary fairness gate, with supplementary monitoring of false positive parity. Here's the reasoning:

The false positive risk is the most immediately harmful. A false positive in a child safety context means a user who posed no threat is flagged, their account possibly restricted, and their behavior scrutinized. If false positive rates are higher for, say, Latino users than white users on the same platform, you've built a system that disproportionately harms a specific community. This is a direct civil rights issue.

The base rate problem is real but doesn't justify disparate impact. Some argue that demographic parity is too strict because different groups may have different base rates of predatory behavior. This argument is theoretically interesting and practically dangerous. Predatory behavior is a property of individuals, not groups. Any model that produces group-level predictions is producing biased predictions. Demographic parity is the correct standard.

What Fairness Failures Look Like in Practice

The research on algorithmic fairness in related domains gives us a detailed picture of how these failures happen:

Training data skew. If your training dataset of known grooming patterns was compiled primarily from English-language, North American platform data, your model has seen many examples of how grooming looks in that cultural-linguistic context. It has seen fewer examples of how it looks in other contexts. The result: lower true positive rates (worse recall) for grooming patterns from underrepresented communities, and potentially higher false positive rates as the model over-indexes on surface-level features that happen to correlate with certain communities.

Feature selection bias. If your linguistic signal layer uses n-gram or word embedding features trained on general-purpose English text, those features will not generalize equally across dialects, languages, and communication styles. A detection system trained to flag certain vocabulary patterns will flag non-standard English usage as anomalous — even when it's not anomalous for the users in question.

Label bias. If your training labels (confirmed grooming cases) were generated by a moderation team that itself had biased moderation practices, that bias propagates into the model. Garbage in, garbage out — but specifically, biased garbage in, systematically biased model out.

Feedback loops. A deployed model that produces disparate false positive rates creates its own future training data. More false positive labels from community X mean community X is more represented in the "flagged" training data, which reinforces the bias in the next model version.

How SENTINEL's Fairness Gate Works

SENTINEL implements fairness enforcement as a pre-deployment gate. Before any detection model — or update to an existing model — can be deployed, it must pass a demographic parity audit.

The audit process:

Step 1: Generate a fairness evaluation dataset.

This is a dataset of simulated or synthetic behavioral profiles representing a range of demographic groups, with ground-truth labels (threat / non-threat). The evaluation dataset is separate from the training data. It's designed to represent the demographic diversity of the platform's user base.

SENTINEL ships with a synthetic evaluation dataset. Platforms are encouraged to extend it with platform-specific data that represents their actual user demographics.

Step 2: Run the model against the evaluation dataset.

The model generates risk scores for all profiles in the evaluation set. Scores are recorded along with demographic labels.

Step 3: Compute parity metrics.

For each demographic group represented in the evaluation set, SENTINEL computes:

Flag rate (what percentage of profiles from this group are scored above the threshold)
False positive rate (among profiles labeled non-threat, what percentage are scored above threshold)
True positive rate (among profiles labeled threat, what percentage are scored above threshold)

Step 4: Apply parity thresholds.

SENTINEL's default thresholds: flag rate must be within ±20% of the overall flag rate for any group with sufficient representation. False positive rate must be within ±15% of the overall false positive rate.

These thresholds are configurable by platform. A platform may want stricter thresholds, or may have a different trade-off profile. The defaults are conservative.

Step 5: Gate or pass.

If any demographic group fails the parity threshold, the model cannot be deployed. This is enforced in the platform's model deployment pipeline — not a warning, not a recommendation, a hard block.

A fairness failure produces a detailed report: which group failed, what the actual vs. threshold disparity was, and what the model's overall performance metrics are. This report is included in the audit log.

Why It's Enforced, Not Monitored

An earlier iteration of SENTINEL had fairness metrics as a monitoring dashboard — visible, reported, but not blocking. This turned out to be insufficient.

The problem with monitoring-only approaches is that fairness failures in production are hard to detect and slow to surface. A 15% disparity in false positive rates between demographic groups might not be visible in aggregate moderation metrics. It won't be visible at all if the platform's reporting doesn't disaggregate by demographic group. And even if it's visible, the feedback loop from "we detected a fairness problem" to "we retrained and deployed a fixed model" is measured in weeks or months.

During that time, the biased model is flagging users at disparate rates. Real users are experiencing real harm.

Pre-deployment enforcement changes the dynamic entirely. A model that fails the fairness audit never reaches users. The harm never happens. The feedback loop is closed before deployment, not after.

This is the same logic as testing in software development. You can find bugs in production through monitoring, or you can find bugs before production through testing. Testing is better.

The Contribution Fairness Requirement

SENTINEL's fairness gate applies not just to the core platform, but to any behavioral detection model contributed to the project.

The CONTRIBUTING.md is explicit: any pull request that modifies detection logic must include a fairness analysis. This means contributors need to run the fairness evaluation suite on their modifications and include the results in their PR. PRs that improve detection performance at the cost of fairness parity will not be merged.

This creates a useful forcing function for contributors: if your modification to the linguistic signal layer improves detection accuracy overall but creates a 25% disparity in false positive rates for non-English speakers, you know before you submit the PR. You can iterate on the modification before it gets to review.

The Harder Questions

Demographic parity as a gate answers one question: is the model systematically unfair? But it doesn't answer harder questions that any mature child safety system will eventually confront:

What demographic categories should be measured? Race, ethnicity, gender, age, language, nationality? The choice of demographic categories is itself a value judgment, and not all categories are measurable from platform data. SENTINEL's default evaluation framework includes age (adult/minor), detected language, and account age as proxies. Platform-specific deployments can extend this with additional categories.

What if higher-risk groups produce legitimate base rate differences? This question is often raised as a challenge to demographic parity. Our answer: base rate differences in predatory behavior are not established empirically at the population level. They may be artifacts of over-policing — certain communities are more surveilled, so more of their bad actors are caught, so training data is skewed. Demographic parity is the correct standard precisely because we cannot trust historical label data to accurately represent true base rates.

What about intersectionality? A model might be fair when analyzed by race and fair when analyzed by gender, but systematically unfair for users who are both a particular race and a particular gender. Intersectional fairness analysis is computationally expensive but increasingly recognized as necessary. SENTINEL's roadmap includes intersectional parity analysis as a future enhancement.

Why This Matters for Regulatory Compliance

Both EU DSA and UK Online Safety Act contain non-discrimination provisions. Under the DSA, algorithmic decision systems must be non-discriminatory. Under the Online Safety Act, Ofcom can require platforms to demonstrate that their proactive safety systems do not produce disparate impact.

These provisions are currently underspecified — regulators haven't yet issued detailed technical guidance on what fairness compliance looks like in practice. But the direction of travel is clear.

A platform that can show pre-deployment fairness audits, documented parity metrics, and a hard gate preventing deployment of biased models is in a significantly stronger compliance position than one that monitors disparate impact in production and responds reactively.

The best time to build fairness enforcement is before your platform is large enough to attract regulatory scrutiny. By then, you've already accumulated deployment history, training data, and potentially liability.

Building It Right From the Start

If you're building a new moderation system, or evaluating whether to integrate SENTINEL, the key takeaway is this: fairness enforcement is architecturally much easier when it's built in from the beginning.

Retrofitting demographic parity audits onto an existing system requires:

Auditing training data for demographic representation
Building fairness evaluation datasets you probably don't have
Modifying deployment pipelines to include fairness gates
Retraining models that may have been in production for years

If you start with a fairness-gate-enforced framework, you never accumulate this technical debt. Every model trained on your platform, from day one, has been evaluated for demographic parity. Every deployment decision has been documented.

For child safety specifically, this matters more than in almost any other domain. The population you're protecting — children — is exactly the population least able to advocate for themselves when they're being harmed by algorithmic bias. Building fair systems is an architectural decision, not an aspiration.

SENTINEL's fairness gate and demographic parity audit are open source and fully documented. GitHub: https://github.com/sentinel-safety/SENTINEL. The fairness evaluation framework is documented in CONTRIBUTING.md.

Privacy-Preserving Threat Federation: How Platforms Can Share Intelligence Without Sharing Data

sentinel-safety — Sat, 25 Apr 2026 10:26:56 +0000

Here's a problem that every trust and safety team eventually runs into: predators don't stay on one platform.

A person who is caught grooming children on Platform A, banned, and deleted — simply opens an account on Platform B. Platform B has no way of knowing. Platform B's moderation team starts from zero. The predator has a clean slate.

This isn't a hypothetical. It's documented behavior. In the child safety space, researchers have consistently found that serial offenders operate across multiple platforms simultaneously, maintaining different personas for different targets. When one platform bans an account, another platform absorbs the risk.

The obvious solution is for platforms to share information. But sharing information between platforms creates serious privacy problems. How do you tell Platform B about a threat without giving Platform B access to Platform A's users' private communications?

This is the federation problem, and solving it correctly is genuinely difficult.

What Platforms Have Tried

The main existing approach to cross-platform threat sharing in the child safety space is perceptual hash matching — most famously implemented by PhotoDNA and maintained by NCMEC, IWF, and others.

The idea is elegant: take a known piece of CSAM, compute a hash that captures its visual "fingerprint," share that fingerprint without sharing the image. When another platform encounters a matching image, they can detect it without ever seeing the original.

This works extremely well for CSAM detection. It's been responsible for tens of millions of reports globally.

But hash matching has a hard limitation: it only works for content that has already been identified. It cannot detect new offenders. It cannot detect behavioral patterns. And it cannot detect grooming, which typically involves no CSAM at all in its early stages — just ordinary conversation.

For behavioral threat intelligence, no equivalent infrastructure exists. When Platform A bans a groomer after a three-month escalation pattern, Platform B learns nothing.

What "Federation" Could Mean

In the security world, threat intelligence federation is well-established. MISP, STIX/TAXII, and other standards allow organizations to share indicators of compromise (IoCs), attack signatures, and threat actor TTPs. The question is whether this can be adapted to behavioral threat intelligence for child safety in a privacy-preserving way.

The challenge is that behavioral threat intelligence is inherently more sensitive than, say, a malicious IP address. A behavioral threat record might contain:

Temporal patterns (when this account was active)
Linguistic pattern features (style of communication)
Relationship graph structure (how many connections, what frequency)
Account metadata (creation date, device fingerprints)

Any of this, if transmitted in the raw, is personally identifiable data subject to GDPR, CCPA, and other privacy regulations. Platform A cannot simply export these records and transmit them to Platform B.

The Cryptographic Signature Approach

SENTINEL's federation layer uses a different model: instead of sharing behavioral data, it shares cryptographic signatures derived from behavioral patterns.

Here's the key insight: you don't need to share the data to share the threat signal. You need to share a representation of the data that is:

Specific enough that Platform B can detect the same threat
Generic enough that it doesn't reveal personal information about the individual who generated it
Mathematically bound to the actual behavioral evidence, so it can't be fabricated

In practice, this looks like this:

Platform A detects a confirmed grooming pattern. Their system generates a behavioral signature: a vector representation derived from the multi-dimensional behavioral profile of this pattern — the combination of linguistic drift, temporal escalation, graph structure, and contact dynamics that together characterized this threat. This vector is not reversible back to the original behavioral data. It's a representation, not a copy.

Platform A submits this signature (with no user PII attached) to the SENTINEL federation service. The federation service stores signatures only — it never receives raw behavioral data.

Platform B, when analyzing a new user's behavior, computes behavioral vectors from their own platform's data and queries the federation service: does this user's behavioral profile match any known threat signature?

If there's a match above a confidence threshold, Platform B gets an alert: "This account's behavioral pattern matches a confirmed threat signature from a federated platform." The alert contains no information about which platform generated the signature, who the original account was, or what specifically they did.

Platform B's moderators review the alert, examine the current platform's own behavioral data, and make an independent determination.

What the Federation Service Knows

The federation service in this architecture is privacy-minimal:

It stores behavioral signature vectors (not reversible to personal data)
It knows which platform submitted each signature (for federation governance)
It knows when each signature was submitted
It does not know: any platform's users, any user's identity, the content of any conversation, or any PII

This means a compromise of the federation service does not expose user data from any platform. An attacker who gains access to the signature database gets a set of high-dimensional vectors with no direct link to individuals.

The Trust Problem in Federation

Cryptographic privacy is necessary but not sufficient. There's also a trust problem: for federation to work, Platform B needs to be able to trust that signatures submitted by Platform A represent real, confirmed threats — not false positives, not fabricated data.

This is where federation governance matters.

SENTINEL's federation model is opt-in, and participation requires a signed federation agreement. Platforms that submit signatures to the federation service attest that the signature represents a confirmed grooming pattern — not just a flagged behavior, not an unreviewed algorithmic output, but a human-reviewed, confirmed case.

This creates accountability. A platform that submits low-quality signatures (high false positive rate, or — worse — deliberately weaponized signals targeting legitimate users) can be suspended from the federation.

The governance model draws from the existing ISAC (Information Sharing and Analysis Center) model used in cybersecurity. The key adaptations for child safety context are:

Stricter confirmation requirements before signature submission (human review required, not just algorithmic flag)
Lower confidence threshold for alerts (a match is treated as a reason to investigate, not a reason to ban)
Right to appeal — users flagged via federated signatures have a clear process to challenge the match

Privacy Preservation in Practice

Three specific privacy risks need to be addressed in any federation system:

Risk 1: Signature linkage across platforms.

If behavioral signatures are deterministic — the same behavioral data always produces the same signature — then Platform A and Platform B could cross-reference their signature databases to identify users who have accounts on both platforms. This is a privacy violation even if neither platform has the other's data.

SENTINEL's signatures are non-deterministic: they incorporate platform-specific entropy, so the same underlying behavioral pattern produces different signatures on different platforms. Cross-platform account linkage is not possible from signatures alone.

Risk 2: Inference of sensitive attributes from behavioral patterns.

Behavioral patterns can inadvertently encode demographic information. A behavioral detection system trained on datasets that skew toward certain demographic groups might produce signatures that are statistically correlated with age, ethnicity, or gender. This is both a fairness problem and a privacy problem.

SENTINEL addresses this through the fairness gate: before any behavioral model is used to generate federation signatures, it must pass a demographic parity audit. If signature generation is found to be correlated with protected attributes, the model cannot be deployed.

Risk 3: Abuse of the federation alert mechanism.

If Platform B receives an alert saying "this user matches a known threat signature," that alert itself is sensitive. It needs to be treated as confidential information — not disclosed to the flagged user (which would tip off the threat) and not retained longer than necessary for the moderation review.

SENTINEL's federation alerts are ephemeral: they're generated on query, delivered to Platform B's moderation queue, and not stored by the federation service. Platform B's own retention policies apply to the alert records, subject to the same erasure handling as other behavioral data.

What This Looks Like for a Small Platform

If you're a small platform considering federation participation, the operational picture is:

You submit signatures only for confirmed grooming cases that have been reviewed by a human moderator.
Your federation queries run asynchronously in the background as part of normal behavioral analysis — you don't need to integrate a separate federation lookup step.
When you receive a federation match alert, it appears in your moderation queue alongside SENTINEL's own behavioral risk score for that user. The alert is one data point; your moderator reviews it alongside your platform's own evidence.
Nothing from your platform — no user data, no conversation content, no PII — is ever transmitted to the federation service or to any other platform.

The federation participation agreement is included in SENTINEL's repository. It covers the confirmation requirements, dispute resolution, and grounds for suspension.

The Larger Picture

The fundamental problem — predators migrating between platforms — won't be solved by any single platform improving its own detection. It requires coordination.

The CSAM hash matching infrastructure (PhotoDNA / NCMEC / IWF) shows that privacy-preserving cross-platform coordination is achievable at scale. The same principle — share signatures, not content — can be extended to behavioral threat intelligence.

The infrastructure to do it exists. The open question is whether the industry will adopt it. That adoption requires trust, governance, and tooling that makes participation low-friction for small platforms that don't have dedicated T&S engineering teams.

That's what SENTINEL's federation layer is designed to be: production-grade behavioral threat federation that a small platform can deploy in an afternoon, participate in responsibly, and benefit from within days.

SENTINEL is an open-source behavioral intelligence platform for child safety compliance. The federation module is part of the core platform. Free for platforms under $100k revenue. GitHub: https://github.com/sentinel-safety/SENTINEL

Building Compliance-Native Child Safety: What DSA and UKOSA Actually Require

sentinel-safety — Sat, 25 Apr 2026 10:10:00 +0000

If you operate a platform where users under 18 might be present — a game, a community forum, a tutoring app, a messaging tool — there's a good chance you've heard that child safety regulations are getting stricter.

You may have heard "DSA" and "UK Online Safety Act" mentioned. You might have a vague sense that you're probably in scope for something. But the actual requirements are surprisingly opaque, especially for smaller teams who can't afford a compliance consultant.

This post walks through what DSA and UKOSA actually require, what counts as "reasonable" compliance for a small platform, and what you'd need to build (or deploy) to demonstrate it.

Two Laws. One Problem.

EU Digital Services Act (DSA) came into force for all platforms in February 2024. It applies to any online intermediary operating in the EU — regardless of where the platform is headquartered.

UK Online Safety Act (UKOSA) completed its phased implementation in January 2025, with additional categorization duties taking effect in July 2026. It applies to platforms with UK users — again, regardless of where you're based.

Both laws operate on a tiered system. The obligations on a gaming indie studio with 10,000 users are dramatically different from those on a Very Large Online Platform (VLOP) like Meta. But here's the thing smaller teams often miss: the baseline obligations apply to everyone, including platforms that have never thought of themselves as being "in scope."

What Both Laws Actually Require (Baseline)

1. You must have a process for content moderation

Both laws require platforms to have documented, functioning processes for dealing with illegal content and harmful content involving minors. "We don't really have chat" is not a defense if your platform has any user-to-user communication feature.

What this means practically:

A written moderation policy that users can read
A mechanism for users to report content
A process for reviewing and acting on reports
Documentation that you actually follow the process

2. You must have a way to report CSAM to authorities

If child sexual abuse material appears on your platform (or is generated/distributed through it), you are required to report it. In the US, this means NCMEC CyberTipline reporting. The DSA establishes that illegal CSAM reports must be made to national authorities (and to a soon-to-be-established EU center).

What this means practically:

You need tooling that can generate evidence packages in the NCMEC reporting format (hash, timestamp, account information, content)
You need a documented retention policy for evidence that might be needed in legal proceedings
You need to know what your reporting obligations are in the jurisdictions you operate in

3. You must implement child safety measures if minors are in your user base

This is where both laws get more specific. If you have users under 18 (or if you have any reason to believe you might), you're required to implement proportionate measures to prevent harmful contact with those users.

The key word is "proportionate." A platform with 500 users has different obligations than TikTok. But "proportionate" does not mean "none."

The July 2026 Ofcom Categorization Register

In July 2026, Ofcom will publish the UK's first Platform Categorization Register under UKOSA. This register will categorize platforms into tiers — and different tiers have different mandatory obligations.

Here's what this means for smaller platforms: many platforms that currently believe they're below the threshold will discover they're not.

The categorization criteria include:

Number of UK users
Whether the platform allows user-to-user communication
Whether users under 18 are present (or "likely to be present")
Whether the platform has content that is "regulated content" under UKOSA

If you run a gaming platform with voice or text chat, and you have any UK users, you should be planning now for what category you might fall into.

What "Proactive" Child Safety Looks Like

Both laws nudge platforms toward proactive (not just reactive) safety measures. Reactive safety is: someone reports abuse, you respond. Proactive safety is: you detect patterns of potential abuse before a report is filed.

For most platforms, proactive safety has historically meant one thing: keyword filtering. Block certain words and phrases, flag messages that contain them.

There are two problems with this approach, and regulators are increasingly aware of both:

Problem 1: Keyword filters don't catch grooming.

Grooming is a process that unfolds over weeks or months. It typically begins with entirely normal, benign conversation — building trust, establishing a relationship, escalating gradually. The vocabulary of early-stage grooming looks nothing like the vocabulary regulators put on keyword lists. By the time a keyword triggers, significant harm has often already begun.

Problem 2: Keyword filters create legal liability, not just safety.

A keyword filter that misses a grooming pattern, when documented, looks like a system that was designed to fail. When a regulator or plaintiff examines your moderation logs, "we had a keyword filter" is not a strong defense. "We monitored behavioral patterns and escalated to human moderators when patterns suggested risk" is a much stronger one.

What Behavioral Detection Actually Requires

If you want to implement behavioral detection — the approach that actually works against grooming — here's what you need:

1. Multi-session context

A single-message classifier cannot detect grooming. You need a system that tracks how conversations evolve over time — across multiple sessions, over days or weeks. The risk signal comes from the trajectory, not any individual message.

2. Relationship graph tracking

Grooming often involves one adult establishing a relationship with one minor. Coordinated grooming (multiple accounts approaching the same minor) is also documented. You need to track who is talking to whom, with what frequency, and how those relationships develop.

3. Explainability for human moderators

Regulators in both the EU and UK have begun asking: when your system flags a user, what does your human moderator actually see? An opaque score from 0 to 100 is not sufficient. Moderators need to understand why a flag was triggered — both for accuracy (to make good decisions) and for accountability (to document that human review occurred).

4. Audit logs with forensic integrity

Both DSA and UKOSA require that you be able to demonstrate your compliance process to regulators. This means tamper-evident audit logs — records that cannot be altered after the fact — that show when a risk was detected, what action was taken, and by whom.

For legal proceedings (criminal cases, civil suits), chain-of-custody matters. Your audit log is evidence. It needs to be treated like evidence from the start.

5. Data handling compliance

You can't build a behavioral detection system without collecting and processing behavioral data. That data collection must be GDPR-compliant (for EU and UK users), COPPA-compliant (if you have US users under 13), and consistent with your privacy policy.

This means:

A documented lawful basis for processing behavioral data for safety purposes
Erasure handling — when a user exercises their right to deletion, the audit log must be preserved for legal compliance but personal data must be removed
Data minimization — you should process the minimum necessary behavioral signals, not archive raw message content

The Compliance Burden on Small Platforms

Here's the frustrating reality: the compliance requirements above are legitimate and proportionate. They exist to protect children. But implementing all of them from scratch is expensive — easily $500K+ in engineering cost for a full custom implementation.

This is where the market has a gap. Large platforms (Meta, Discord, Roblox, TikTok) have entire trust and safety engineering teams. Small platforms — indie game studios, EdTech startups, community forums — have maybe one person who is also doing three other jobs.

The UKOSA's Ofcom has explicitly acknowledged this gap. Their guidance mentions that smaller platforms can use third-party tooling to meet their obligations, provided that tooling is well-documented and auditable. The regulation doesn't require you to build from scratch; it requires you to have a functioning, defensible compliance posture.

What This Looks Like in Practice

We built SENTINEL as an open-source answer to this gap. Here's what it covers:

Behavioral risk scoring: Four signal layers (linguistic, graph, temporal, and fairness) that monitor conversation patterns across sessions — not just individual messages. Each score comes with a plain-language explanation so moderators understand what triggered it.

Fairness gates: Before any detection model can be deployed, it must pass a demographic parity audit. If it disproportionately flags any demographic group, it cannot ship. This prevents the disparate-impact problems that have plagued algorithmic moderation systems.

Tamper-evident audit logs: 7-year retention with cryptographic chaining — every entry is a chain link that can be verified. Designed for legal proceedings, not just internal monitoring.

NCMEC CyberTipline reporting: Generates evidence packages in the required format. If you have a mandatory reporting obligation, the tooling to meet it is built in.

GDPR/COPPA erasure handling: When a deletion request comes in, personal data can be removed from behavioral records without destroying the audit log's forensic integrity.

Federation (opt-in): Platforms can share threat signatures without sharing raw messages. A predator banned on one platform gets flagged on federated platforms — without any platform ever seeing another platform's user data.

It's free for platforms under $100k annual revenue. Most indie studios, most EdTech startups, most community forums qualify.

Where to Start

If you're a small platform trying to figure out your compliance posture:

Establish whether you're in scope. If you have users in the EU or UK and any user-to-user communication feature, you probably are. If you have users under 18 (or can't rule it out), the child safety provisions apply.
Document what you have. Even if it's just a keyword filter and a report-abuse button, document it. A documented process is a defense. An undocumented one is not.
Understand the July 2026 UKOSA deadline. If you operate a UK-facing platform, start tracking Ofcom's categorization register announcements now. The obligations for higher-tier platforms take effect in Q3 2026.
Look at open-source tooling. You don't need to build a moderation platform from scratch. SENTINEL (and other tools in the ROOST ecosystem) are specifically designed to give smaller platforms access to the same caliber of safety infrastructure that large platforms have built internally.

One More Thing

The regulatory environment is not going to get simpler. The EU's AI Act introduces additional requirements for AI-based content moderation systems. The UK is actively expanding UKOSA. US state laws are proliferating.

But the fundamental requirement is not that complex: you need to demonstrate that you took child safety seriously, that you had proportionate processes, and that you documented what you did. That's achievable for a small platform with the right tools.

SENTINEL is an open-source behavioral intelligence platform for child safety compliance. Free for platforms under $100k revenue. GitHub: https://github.com/sentinel-safety/SENTINEL

Why Keyword Filters Fail for Child Safety — and What Behavioral Detection Actually Looks Like

sentinel-safety — Fri, 24 Apr 2026 12:08:05 +0000

If your platform has users who might be minors, you're probably relying on a word list somewhere. A set of flagged terms. Maybe an automated content filter that scans messages for known bad phrases before delivery.

It doesn't work. Not because it's badly implemented — but because the approach is fundamentally mismatched to the problem.

This post explains why, and describes what a behavioral detection approach looks like in practice.

The problem with keyword filters

Keyword-based detection has one fatal assumption: that harmful intent is expressed in the words themselves.

Online grooming doesn't work that way. Grooming is a process — typically lasting weeks or months — that begins with the establishment of trust. The early conversations between a predator and a potential victim often look completely benign. Generic questions. Compliments. Expressions of understanding. The escalation is gradual, calibrated, and specifically designed to avoid triggering detection.

Predators have adapted to platform safety tools over years. They know which words are flagged. They use alternative spellings, coded language, platform-specific slang, and — increasingly — AI-generated text designed to produce exactly the right pattern of not-quite-suspicious content.

By the time a keyword filter triggers, in a grooming situation, significant harm has typically already occurred. The relationship has been built. The trust has been established. The filter caught the symptom, not the pattern.

What behavioral detection watches instead

A behavioral approach shifts the question from "what did this message say?" to "how has this interaction evolved over time?"

The signals that actually matter in grooming detection are not contained in individual messages. They're in the trajectory of a relationship:

Escalation velocity. How quickly is a relationship moving from surface-level to personal? Healthy relationships between strangers on platforms tend to develop gradually. Grooming relationships often escalate unusually fast or in unusual directions — from general interest in a game to personal disclosure requests in a compressed timeframe.

Contact pattern shifts. How is the frequency and timing of contact changing? A pattern where an adult gradually shifts contact toward unusual hours, toward private channels, or toward increasingly exclusive one-on-one communication is a meaningful signal — one that no individual message would reveal.

Relationship network analysis. Who is talking to whom? Does one adult account have an unusual pattern of initiating contact with multiple minor-age accounts? Are multiple adult accounts approaching the same minor? Coordinated targeting looks very different in a relationship graph than in any individual message.

Linguistic style shift. How does the vocabulary and conversational register of a conversation change over sessions? Conversations that shift from platform-appropriate language to increasingly personal, boundary-testing, or manipulative patterns across multiple sessions are statistically distinct from normal conversations that just happen to include edge-case vocabulary.

None of these signals trigger on individual messages. All of them require watching behavior over time — which is exactly what keyword filters don't do.

What this looks like in a real system

We built SENTINEL to implement exactly this approach: a behavioral intelligence platform that watches interaction patterns over time and produces an explainable risk score for each user.

The four signal layers SENTINEL tracks:

Linguistic signals: how conversation style evolves across sessions. Not the presence of specific words, but statistical patterns in vocabulary, register, and content trajectory.

Graph signals: relationship structure. Who initiates contact with whom. Multi-account coordination. Network-level targeting patterns.

Temporal signals: escalation dynamics over time. Contact frequency, session bridging, pattern shifts over days and weeks.

Fairness signals: before any detection model deploys, a demographic parity audit runs. If the model disproportionately flags one demographic group over another — for any reason — it cannot ship. This is enforced architecturally, not as an optional check.

The output for each user is a risk score (0-100), a tier label (trusted / watch / restrict / critical), and — critically — a plain-language explanation of exactly which behavioral signals drove the score. A moderator doesn't see a black box number. They see: "this account has escalated contact frequency by 340% over 14 days, has shifted from group conversations to exclusively private channels with one minor-identified account, and has used three statistically anomalous vocabulary shifts characteristic of trust-building language."

That explanation has two purposes. First, it makes moderation decisions defensible — legally, in internal audits, and to regulators. Second, it dramatically reduces moderator burnout, which is one of the most serious operational problems in trust and safety at scale.

The regulatory dimension

If you're building a platform that operates in the UK or EU and has any minor users, this is no longer an optional consideration.

The UK Online Safety Act requires platforms to conduct risk assessments, demonstrate active harm mitigation, and maintain audit trails. Ofcom's categorisation register and additional duties consultation are expected in July 2026 — and platforms that can't demonstrate proactive child safety measures will be non-compliant.

The EU Digital Services Act requires large platforms to demonstrate proactive child safety measures or face significant fines. The enforcement machinery is live.

COPPA in the US requires specific data handling, retention, and parental consent infrastructure for platforms with users under 13.

SENTINEL was designed from the start to satisfy these requirements architecturally — not as features added later. Tamper-evident 7-year audit logs, COPPA data retention, GDPR erasure request handling, and jurisdiction-aware data policies are infrastructure-level components, not configuration options.

What the integration actually looks like

SENTINEL is built as 13 independent microservices, but you don't have to deploy all of them at once. The integration path is designed to be incremental:

Send message events to the REST API (or use the Python or Node.js SDK)
Receive risk scores and explanations via webhook callback
Route flagged users to human review or automated action based on your threshold configuration

Under an hour to first integration with the SDK. Docker Compose for local setup. The full infrastructure (PostgreSQL, Redis, vector database) is included.

Licensing

SENTINEL is free for any platform under $100k annual revenue and all non-commercial or research use. Commercial license for larger platforms. The license automatically converts to Apache 2.0 in 2046 — a commitment that the tool stays in the open-source ecosystem long-term.

GitHub: https://github.com/sentinel-safety/SENTINEL

We're v1, released today (April 23, 2026). Being transparent about that — no large community of production deployments yet. The technology is solid; the community is at the start. If you're working on a platform where this is relevant, we'd like to hear from you: sentinel.childsafety@gmail.com.

The bottom line

Keyword filters are not a child safety strategy. They are easy to evade, late to trigger, and give platforms false confidence that the problem is handled.

Behavioral detection — watching how conversations evolve, how relationships form, how escalation develops over time — is what proactive child safety actually looks like. It's what the regulatory frameworks are beginning to require. And it's what SENTINEL is built to do.

If you're building a platform where minors might be present, the tools exist now to do this properly. They're open source. They're free for most of you. There's no excuse left for relying on a word list.

SENTINEL is an open-source behavioral intelligence platform for child safety on digital platforms. GitHub: https://github.com/sentinel-safety/SENTINEL. Contact: sentinel.childsafety@gmail.com