Research project culminating in a NeurIPS 2024 paper that constructs cryptographic backdoors in transformer language models which remain unelicitable and very hard to detect or evaluate, even with full white-box access, challenging standard AI safety evaluation methods.
Endorsements support Contramont Research.
Research project culminating in a NeurIPS 2024 paper that constructs cryptographic backdoors in transformer language models which remain unelicitable and very hard to detect or evaluate, even with full white-box access, challenging standard AI safety evaluation methods.
Endorsements support Contramont Research.
People
Updated 05/18/26 · By grantmaking.aiGrants Received– no grants recorded
Updated 05/18/26 · By grantmaking.aiDiscussion
Sign in to comment
No comments yet. Be the first to share your thoughts.