Research program on making advanced AI systems reliably do what humans intend, using approaches such as provable behavioral guarantees in model-based reinforcement learning agents, zero-shot cooperation in RL systems, and interpretability of what models are learning.
Endorsements support Cavendish Labs.
Research program on making advanced AI systems reliably do what humans intend, using approaches such as provable behavioral guarantees in model-based reinforcement learning agents, zero-shot cooperation in RL systems, and interpretability of what models are learning.
Endorsements support Cavendish Labs.
People– no linked people
Updated 05/18/26 · By grantmaking.aiGrants Received– no grants recorded
Updated 05/18/26 · By grantmaking.aiDiscussion
Sign in to comment
No comments yet. Be the first to share your thoughts.