Applied Interpretability to Mitigate LLM Homogenization

app.notion.com

Measuring homogenization and social bias in LLMs and developing interventions to promote diversity.

Project Details

The Problem

It is hard to adapt and deploy GenAI across diverse settings and communities.

• Generative AI models reproduce human biases in their training data and further amplify them through mode collapse

• We lack reliable ways to increase meaningful diversity in LLMs, and post-training alignment actively reduces diversity as a side effect.

• The loss of diversity and the amplification of social biases both harm people at the margins and narrow the human experience.

This is the LLM Homogenization problem.

The Vision

• World where AI serves all communities, not only dominant ones

The Mission

• Mitigate harm to the minoritized

• Empower people based on their uniqueness

The Proposal

• Let's develop interventions (Mech Interp, Repr Engr) for LLMs such that they are safely integrated, reducing harms from social biases, ensuring their behavior is appropriately contextual and situated to the needs of a multiplicity of communities.

What is this Project?

This project aims to advance research that mitigates LLM Homogenization.

More specifically, this includes:

Research collaborations and fellowships

I am planning to propose projects for the upcoming AISC andSPAR rounds.

The WIP project proposal is here: https://app.notion.com/p/unruly-intimacies/Interpreting-Homogenization-and-Diversity-in-LLMs-3589b6affeb481499774fbe59191a8b5.

Additionally, I support an ongoing collaboration with Queer In AI that investigates LLM Bias against LGBTQ+

The goal is to create a research community focused on the homogenization problem and that takes diversity in LLMs seriously.

The funds will primarily go towards computing expenses, conference travel, and other expenses to support the projects I mentor and collaborate on.

In six months, as concrete output, I expect a few papers to be ready for submission to conferences (NeurIPS, ICLR, ICML, etc), and a cohort of early-stage researchers eager to continue this type of work.

My own independent research (+ career transition)

My own work has already made progress on this research stream

At the Technical AI Safety Conference (TAIS) 2026 @ Oxford, I presented a paper that formalizes LLM Homogenization: https://arxiv.org/pdf/2601.06116.

For the past few months, all my independent research has been self-funded.

The secondary use of funds will be to extend the runway of my independent work until I find a more permanent home for it.

In six months, I expect to either:

• Find a full-time position at a frontier lab or a non-profit where I can continue this research

• Create my own organization that supports this research

What perspective do I bring?

I believe my proposed research stream differentiates from previous efforts in several ways:

• Transdisciplinarity

I am proposing research that actively engages with commonly overlooked fields such as Queer Theory, Psychoanalysis, and Category Theory.

To truly create Participatory and Responsible AI, we need to consider the insights from multiple disciplines. I believe they provide good abstractions and ideas needed to advance AI Safety.• Applied Interpretability

I believe that a wide variety of newly developed methodologies can be applied to the LLM Homogenization problem. To name a few, this could include probes to monitor bias at inference time, AI control protocols that ensure diverse generation, and mapping the representational geometry of gender and racial bias.

• Locality

Users and developers care more about a model's local behavior than its global behavior. What matters is not whether the model is "broadly unbiased", but whether it behaves without social bias within the specific conditions of a given deployment.

I believe we can exploit the locality and constraints of particular use cases to guarantee "good" behavior.

Sample possible papers:

• Post-training Homogenization Evaluation

We know post-training alignment broadly reduces diversity in LLMs. This paper would establish evaluations to measure homogenization introduced by post-training and ask frontier labs to report it in their LLM system cards.

• Predicting Social Bias from Activation Space

Previous research (https://arxiv.org/pdf/2511.04527) has shown that outcome distributions can be predicted from hidden activations. Could we leverage this to predict when an LLM will produce distributions shaped by social bias? Could we use such probes during decoding to ensure LLM output distributions are free of social bias?

• Diversity Functional Concepts

Recent research has identified functional concepts. Could we identify a functional concept in activation space that generally promotes creativity and diversity?

Theory of Impact

Homogenization impoverishes everyone and disproportionately harms the people on the margins by amplifying social biases. Today’s social problems can escalate to catastrophic levels because of GenAI.
Additionally, diversity is at the core of technical problems (like the trade-off between hallucinations and creativity, and out-of-distribution robustness)
This work directly addresses the homogenization problem by
- Studying how diversity is mediated in LLMs’ internals
- Developing ways to promote diversity in LLMs

Over the past decades, we have seen how social media and recommender systems have created echo chambers and filter bubbles worldwide. In some cases, the misinformation and polarization have led to social instability, erosion of democratic participation, and real-world violence. I believe LLM Homogenization is poised to cause similar issues, but on a greater catastrophic scale. This work actively seeks to mitigate LLM Homogenization and, in turn, its associated existential risk.

People

Ian Rios-Sialer

Team Member

Grants Received– no grants recorded

Funding Asks

grantmaking.ai Launch Round

Applied

Minimum

$10,000

Ideal

$80,000

How the money will be spent

The money would support this research stream for the next six to eight months.

Minimal:

• Compute for projects I would collaborate and/or mentor (AISC, Queer In AI, SPAR)

• Data collection, surveys, etc

Nice to have:

• Travel to conferences, workshops, etc for mentees and me (mentor)

Ideal:

• Living expenses (San Francisco Rent, etc) during my career transition to advance this research

Discussion

Private comments in this thread are only shown to approved funders and grant reviewers. Email hi@grantmaking.ai or to request access.

Grant ReviewerPrivate6h?