Principled Agents

Principled Agents is a research nonprofit building a concrete plan for AI alignment.

In anticipation of AI systems with superhuman capabilities, it is critical to ensure they will act in support of human interests. By better understanding the safety properties needed and how incentives shape behavior, we can identify goals that are safe for powerful AI to pursue. Our aim is to construct principled solutions for the core problems in AI alignment, through research that is theoretically sound and empirically grounded.

Our Research

The focus of our research is on designing goals that are safe for AGI to pursue, along with protocols to implement them. Our first paper, Corrigibility Transformation, shows how goals can be modified to remove instrumental incentives for preventing goal updates, including shutdown.

Corrigibility Transformation: Constructing Goals That Accept Updates
Rubi Hudson
2026

Get Involved

Principled Agents is currently hiring a Researcher and a Research Assistant to work on preventing harms from AI misgeneralization.

We also organize an online discussion group, where authors present their AI alignment research.

Open Roles Discussion Group

Contact Us

✉ contact@principledagents.org