Discussion Group

27.

Emergent Alignment via Competition

Natalie Collina, Surbhi Goel, Aaron Roth, Emily Ryu, Mirah Shi

2026

26.

Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction

Tianyi Alex Qiu, Micah Carroll, and Cameron Allen

2026

25.

Imperfect Recall and AI Delegation

Eric Olav Chen, Alexis Ghersengorin, and Sami Petersen

2024

24.

Jackpot! Alignment as a Maximal Lottery

Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, and Kate Larson

2025

23.

Conservative Agency via Attainable Utility Preservation

Alex Turner, Dylan Hadfield-Menell, and Prasad Tadepalli

2019

22.

The Shutdown Problem: Incomplete Preferences as a Solution

Elliott Thornley

2024

21.

Natural Selection of Artificial Intelligence

Jeffrey Ely and Balazs Szentes

2023

20.

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Caspar et al.

2023

19.

A Theory of Rule Development

Glenn Ellison and Richard Holden

2014

18.

Evolution of Preferences

Eddie Dekel, Jeffrey Ely, and Okan Yilankaya

2007

17.

Hidden Incentives for Auto-Induced Distributional Shift

David Krueger, Tegan Maharaj, and Jan Leike

2020

16.

Misspecification in Inverse Reinforcement Learning

Joar Skalse and Alessandro Abate

2022

15.

A Robust Bayesian Truth Serum for Small Populations

Jens Witkowski and David C. Parkes

2012

14.

Quantilizers: A Safer Alternative to Maximizers for Limited Optimization

Jessica Taylor

2016

Safety Considerations for Online Generative Modeling

Sam Marks

2022

13.

Safe Pareto Improvements for Delegated Game Playing

Caspar Oesterheld and Vince Conitzer

2021

12.

Functional Decision Theory: A New Theory of Instrumental Rationality

Eliezer Yudkowsky and Nate Soares

2017

11.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson

2016

Emergent Cover Signaling in Adversarial Reference Games

Dhara Yu, Jesse Mu, and Noah Goodman

2022

10.

Getting Dynamic Implementation to Work

Yi-Chun Chen, Richard Holden, Takashi Kunimoto, Yifei Sun, and Tom Wilkening

2018

9.

Cooperation, Conflict, and Transformative Artificial Intelligence - A Research Agenda

Jesse Clifton

2020

8.

Investment Incentives in Truthful Approximation Mechanisms

Mohammad Akbarpour, Scott Kominers, Kevin Li, Shengwu Li, and Paul Milgrom

2020

7.

Corrigibility

Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong

2015

The Off Switch Game

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell

2016

6.

Fully General Online Imitation Learning

Michael Cohen, Marcus Hutter, and Neel Nanda

2021

5.

Model-Free Opponent Shaping

Chris Lu, Timon Willi, Christian Schroeder de Witt, and Jakob Foerster

2022

The Good Shepherd: An Oracle Agent for Mechanism Design

Jan Balaguer, Raphael Koster, Christopher Summerfield, and Andrea Tacchetti

2022

4.

Discovering Agents

Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, and Tom Everitt

2022

3.

Decision Scoring Rules (Extended Version)

Caspar Oesterheld and Vincent Conitzer

2020

2.

Risks from Learned Optimization in Advanced Machine Learning Systems

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant

2019

1.

The Principal-Agent Alignment Problem in Artificial Intelligence

Dylan Hadfield-Menell

2021

Incomplete Contracting and AI Alignment

Dylan Hadfield-Menell and Gillian Hadfield

2018

Past Papers