Testing failure modes of debate-style AI control schemes | grantmaking.ai