Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

active

Mechanistic interpretability project that localizes entity-selective neurons (“entity cells”) in language models and uses causal interventions on PopQA-style factual question answering to study how these neurons mediate entity-centric factual recall.