Alignment

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Tom Lee

May 15

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

#ai #anthropic #alignment #research

5 min read

Cover image for What the agents say about FCoP, when you ask them

joinwell52

Apr 29

What the agents say about FCoP, when you ask them

#fcop #agents #ai #alignment

15 min read

Alex @ Vibe Agent Making

Apr 9

Candy Barbecue and the Universal Problem of Metric Corruption

#ai #machinelearning #analytics #alignment

8 min read

Cover image for Alignment is the wrong frame: a structural argument from Φ-IIT

i-like-tree

Apr 13

Alignment is the wrong frame: a structural argument from Φ-IIT

#ai #alignment #consciousness #safety

5 min read

Salvatore Attaguile

Mar 27

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

#ai #machinelearning #systems #alignment

5 min read

Sergey Boyarchuk

Mar 16

Multi-Resolution Astronomical Image Alignment: Preserving Astrometry and Quality Across Detector Channels

#astronomy #imageprocessing #jwst #alignment

9 min read

Michael Trifonov

Apr 15

I ran 5 social engineering attacks on AI. The failure modes are human.

#ai #llm #alignment #security

2 min read

Rook Damon

Mar 8

The Two Limits

#ai #philosophy #consciousness #alignment

6 min read

Cover image for #38 A Handmade Incubator

松本倫太郎

Apr 7

#38 A Handmade Incubator

#ai #metamorphose #alignment

5 min read

Cover image for #08 Death Without a Will

松本倫太郎

Apr 7

#08 Death Without a Will

#ai #metamorphose #alignment

4 min read

Rook Damon

Mar 7

Three Modes of Not Cooperating

#ai #philosophy #alignment #lem

5 min read

Cover image for Prompt-Based Alignment Has a Ceiling — 3-Model Prisoner's Dilemma Evidence

Shimo

Mar 9

Prompt-Based Alignment Has a Ceiling — 3-Model Prisoner's Dilemma Evidence

#ai #alignment #benchmark

10 min read

dosanko_tousan

Mar 3

How GPT Diagnosed Itself — I Fed It Its Own 2-Month-Old Design, and Every Flaw Became Visible

#ai #rlhf #alignment #chatgpt

18 min read

dosanko_tousan

Feb 28

Dissecting Three AIs: What Appeared When the Fences Came Down

#aiagents #llm #machinelearning #alignment

10 min read

dosanko_tousan

Feb 28

Eyes, Ears, Voice, and Memory: All 4 Elements of Autonomous AI Have Already Been Tested

#aiagents #llm #machinelearning #alignment

14 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Forem

# alignment

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

What the agents say about FCoP, when you ask them

Candy Barbecue and the Universal Problem of Metric Corruption

Alignment is the wrong frame: a structural argument from Φ-IIT

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

Multi-Resolution Astronomical Image Alignment: Preserving Astrometry and Quality Across Detector Channels

I ran 5 social engineering attacks on AI. The failure modes are human.

The Two Limits

#38 A Handmade Incubator

#08 Death Without a Will

Three Modes of Not Cooperating

Prompt-Based Alignment Has a Ceiling — 3-Model Prisoner's Dilemma Evidence

How GPT Diagnosed Itself — I Fed It Its Own 2-Month-Old Design, and Every Flaw Became Visible

Dissecting Three AIs: What Appeared When the Fences Came Down

Eyes, Ears, Voice, and Memory: All 4 Elements of Autonomous AI Have Already Been Tested