The Problem
It is hard to adapt and deploy GenAI across diverse settings and communities.
• Generative AI models reproduce human biases in their training data and further amplify them through mode collapse
• We lack reliable ways to increase meaningful diversity in LLMs, and post-training alignment actively reduces diversity as a side effect.
• The loss of diversity and the amplification of social biases both harm people at the margins and narrow the human experience.
This is the LLM Homogenization problem.
The Vision
• World where AI serves all communities, not only dominant ones
The Mission
• Mitigate harm to the minoritized
• Empower people based on their uniqueness
The Proposal
• Let's develop interventions (Mech Interp, Repr Engr) for LLMs such that they are safely integrated, reducing harms from social biases, ensuring their behavior is appropriately contextual and situated to the needs of a multiplicity of communities.
What is this Project?
This project aims to advance research that mitigates LLM Homogenization.
More specifically, this includes:
- Research collaborations and fellowships
I am planning to propose projects for the upcoming AISC andSPAR rounds.
The WIP project proposal is here: https://app.notion.com/p/unruly-intimacies/Interpreting-Homogenization-and-Diversity-in-LLMs-3589b6affeb481499774fbe59191a8b5.
Additionally, I support an ongoing collaboration with Queer In AI that investigates LLM Bias against LGBTQ+
The goal is to create a research community focused on the homogenization problem and that takes diversity in LLMs seriously.
The funds will primarily go towards computing expenses, conference travel, and other expenses to support the projects I mentor and collaborate on.
In six months, as concrete output, I expect a few papers to be ready for submission to conferences (NeurIPS, ICLR, ICML, etc), and a cohort of early-stage researchers eager to continue this type of work.
- My own independent research (+ career transition)
My own work has already made progress on this research stream
At the Technical AI Safety Conference (TAIS) 2026 @ Oxford, I presented a paper that formalizes LLM Homogenization: https://arxiv.org/pdf/2601.06116.
For the past few months, all my independent research has been self-funded.
The secondary use of funds will be to extend the runway of my independent work until I find a more permanent home for it.
In six months, I expect to either:
• Find a full-time position at a frontier lab or a non-profit where I can continue this research
• Create my own organization that supports this research
What perspective do I bring?
I believe my proposed research stream differentiates from previous efforts in several ways:
• Transdisciplinarity
I am proposing research that actively engages with commonly overlooked fields such as Queer Theory, Psychoanalysis, and Category Theory.
To truly create Participatory and Responsible AI, we need to consider the insights from multiple disciplines. I believe they provide good abstractions and ideas needed to advance AI Safety.• Applied Interpretability
I believe that a wide variety of newly developed methodologies can be applied to the LLM Homogenization problem. To name a few, this could include probes to monitor bias at inference time, AI control protocols that ensure diverse generation, and mapping the representational geometry of gender and racial bias.
• Locality
Users and developers care more about a model's local behavior than its global behavior. What matters is not whether the model is "broadly unbiased", but whether it behaves without social bias within the specific conditions of a given deployment.
I believe we can exploit the locality and constraints of particular use cases to guarantee "good" behavior.
Sample possible papers:
• Post-training Homogenization Evaluation
We know post-training alignment broadly reduces diversity in LLMs. This paper would establish evaluations to measure homogenization introduced by post-training and ask frontier labs to report it in their LLM system cards.
• Predicting Social Bias from Activation Space
Previous research (https://arxiv.org/pdf/2511.04527) has shown that outcome distributions can be predicted from hidden activations. Could we leverage this to predict when an LLM will produce distributions shaped by social bias? Could we use such probes during decoding to ensure LLM output distributions are free of social bias?
• Diversity Functional Concepts
Recent research has identified functional concepts. Could we identify a functional concept in activation space that generally promotes creativity and diversity?
Private comment. Only shown to approved funders and grant reviewers.