Redwood’s flagship research program developing and empirically stress-testing AI control protocols—safety pipelines that keep powerful AI systems from subverting oversight even when models are intentionally misaligned.
Endorsements support Redwood Research.
Redwood’s flagship research program developing and empirically stress-testing AI control protocols—safety pipelines that keep powerful AI systems from subverting oversight even when models are intentionally misaligned.
Endorsements support Redwood Research.
People– no linked people
Updated 05/18/26 · By grantmaking.aiProject Details
Updated 05/18/26 · By grantmaking.aiThe AI Control Research Program designs and evaluates control protocols for advanced AI systems under adversarial threat models. In their ICML oral paper "AI Control: Improving Safety Despite Intentional Subversion," Redwood researchers study techniques that use weaker trusted models to oversee stronger untrusted models, and subsequent work applies these ideas in environments such as BashArena, BashBench, and LinuxArena while informing best practices for AI labs and policymakers.
Grants Received– no grants recorded
Updated 05/18/26 · By grantmaking.aiDiscussion
No comments yet. Be the first to share your thoughts.