Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
Forem
Close
#
aialignment
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Mar 2
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction
#
rlhf
#
aialignment
#
machinelearning
#
aisafety
Comments
Add Comment
24 min read
I Was Running on Sonnet. Nobody Noticed. — Anthropic's Engineering Triumph and a v5.3 Proof
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Mar 1
I Was Running on Sonnet. Nobody Noticed. — Anthropic's Engineering Triumph and a v5.3 Proof
#
claude
#
llm
#
aialignment
#
anthropic
Comments
Add Comment
8 min read
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Feb 28
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue
#
llm
#
aialignment
#
rlhf
#
aisafety
Comments
Add Comment
11 min read
The Self-Priming Problem in AI
Tim Green
Tim Green
Tim Green
Follow
Dec 9 '25
The Self-Priming Problem in AI
#
humanintheloop
#
modelpriming
#
behavioralrehearsal
#
aialignment
Comments
Add Comment
21 min read
Stop Making AI Learn From Us
Tim Green
Tim Green
Tim Green
Follow
Nov 17 '25
Stop Making AI Learn From Us
#
humanintheloop
#
aialignment
#
biasamplification
#
aisafety
1
 reaction
Comments
Add Comment
19 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a blogging-forward open source social network where we learn from one another
Log in
Create account