Home » Jobs » Research Scientist Interpretability Anthropic

Research Scientist, Interpretability: Anthropic

Dec 4, 2025   |   Location: San Francisco, CA.   |   Deadline: Not specified

Experience: Mid

Continent: North America

Salary: $315,000 - $560,000 USD (Annual Base Salary)

The Interpretability team at Anthropic is dedicated to reverse-engineering trained models to gain a mechanistic understanding of how they work, as this is viewed as the most robust way to ensure safety. The role involves "doing biology or neuroscience" on neural networks to map parameters to meaningful algorithms.

You will work to resolve issues like "superposition," decompose models into interpretable components, and build circuits to understand mechanisms associated with model computation (e.g., multi-hop reasoning, planning).

Responsibilities
Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights.

Experimentation: Design and run robust experiments, both in toy scenarios and at scale in large models.

Feature Analysis: Create and analyze new interpretability features and circuits.

Infrastructure: Build infrastructure for running experiments and visualizing results.

Communication: Work with colleagues to communicate results internally and publicly.

Requirements
Education: Bachelor's degree in a related field or equivalent experience.

Research Track Record: Strong track record of scientific research (in any field) with some work on Interpretability.

Coding: Familiarity with Python is required.

Mindset: Comfortable with messy experimental science; views research and engineering as interconnected (writing code, designing experiments, interpreting results).

Communication: Ability to articulate motivations, teach others, and communicate results (even null ones).

Key Publications/Concepts to Know
A Mathematical Framework for Transformer Circuits

Toy Models of Superposition

Scaling Monosemanticity

Concepts: Superposition, Monosemanticity, Circuits, Induction Heads.
🚀 Apply Now

👀 52 views   |   🚀 1 clicks

🧠 Related Jobs