Aisafety

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Stephen Trembley

May 9

Building a Compliant AI Agent System: Lessons from 347 Production Agents

#ai #compliance #aisafety #enterpriseai

5 min read

Cover image for The Sovereign Safety Gap: Why AI Alignment Must be Contextual.

Ebikara Spiff ᴀɪᴄᴍᴄ

May 2

The Sovereign Safety Gap: Why AI Alignment Must be Contextual.

#aisafety #ai #aigovernance #globalsouth

3 min read

Kunal

Apr 29

AI Agent Failure in Production: 5 Patterns That Would Have Prevented the PocketOS Database Disaster [2026]

#aiagents #aisafety #postmortem #devops

8 min read

Kunal

Apr 16

Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]

#aisafety #datapoisoning #insiderthreat #datagovernance

7 min read

Kunal

Apr 15

Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]

#aisafety #anthropic #llm #deceptivealignment

7 min read

Simon Paxton

Apr 10

AI liability: Illinois’ Bill Could Turn Reports Into Immunity

#airegulation #openai #aisafety #illinois

8 min read

Laurent DeSegur

Apr 9

Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

#aisafety #claudecode #interpretability #aiagents

13 min read

Simon Paxton

Apr 7

The Indianapolis Data Center Shooting Is a Local Bug Report

#datacenters #cybersecurity #aisafety #techpolicy

8 min read

Rishabh Sethia

Apr 6

Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

#ai #claude #anthropic #aisafety

10 min read

Simon Paxton

Apr 5

Public Misconceptions About AI Are Breaking the Wrong Things

#machinelearning #aiethics #aisafety #chatgpt

8 min read

Tom Lee

Mar 31

NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

#soulspec #persona #aisafety #research

4 min read

Laurent Laborde

Apr 3

Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

#aisafety #ai

3 min read

Laurent Laborde

Apr 3

Would you tell me if you turned evil ?

#discuss #ai #aisafety

16 min read

Simon Paxton

Mar 29

Greg Brockman Donation Shows AI Safety Is Political

#openai #anthropic #airegulation #aisafety

6 min read

Simon Paxton

Mar 28

Anthropic Data Leak: How Ops Failures Undermine AI Safety

#anthropic #databreach #cybersecurity #aisafety

7 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Forem

# aisafety

Building a Compliant AI Agent System: Lessons from 347 Production Agents

The Sovereign Safety Gap: Why AI Alignment Must be Contextual.

AI Agent Failure in Production: 5 Patterns That Would Have Prevented the PocketOS Database Disaster [2026]

Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]

Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]

AI liability: Illinois’ Bill Could Turn Reports Into Immunity

Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

The Indianapolis Data Center Shooting Is a Local Bug Report

Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

Public Misconceptions About AI Are Breaking the Wrong Things

NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

Would you tell me if you turned evil ?

Greg Brockman Donation Shows AI Safety Is Political

Anthropic Data Leak: How Ops Failures Undermine AI Safety