[Expression of Interest] Research Manager, Interpretability: Anthropic

Feb 2, 2026 | Location: San Francisco, CA (Hybrid - 3 days/week in office) | Deadline: Not specified

Crucial Context: This is a "Waitlist" Application. The posting explicitly states: "We don't have open Research Manager positions on the Interpretability team at this time." Anthropic is creating a talent pool. They know they will need leaders soon as they scale, and they want to have a pipeline of vetted candidates ready to go.

The Mission: The MRI for AI The Interpretability team is arguably the most "Anthropic" team at Anthropic. Their goal is Mechanistic Interpretability. Most companies treat LLMs as a "Black Box" (Input -> Magic -> Output). This team treats them as a biology problem. They want to open the brain of the AI, map the neurons, and understand exactly which circuits fire when the model lies, plans, or codes.

## Key Responsibilities
The Research Catalyst: You are not necessarily writing the papers yourself. You are managing the "geniuses" who do. Your job is to unblock them, manage their careers, and ensure the team is moving in a cohesive direction.

Operationalizing Science: Research is messy and ambiguous. Your job is to bring just enough process (project planning, hiring pipelines, goal setting) to keep the team moving fast without stifling creativity.

Hiring & Scaling: A massive part of this role is "sourcing and closing." You are building the team that will likely define AI safety standards for the industry.

## Strategic Analysis: The "Biology" Analogy
The Scientific Method: The JD uses terms like "biology," "neuroscience," and "reverse engineering." They view the model as a natural organism that needs to be studied.

The "Superposition" Problem: You need to be familiar with their specific research lineage. They are trying to solve "Superposition" (where one neuron encodes multiple unrelated concepts). If you haven't read Toy Models of Superposition, you are unlikely to pass the screen.

Safety via Understanding: Anthropic’s thesis is that you cannot make AI safe if you don't know how it works. This role is the bridge between "pure research" and "applied safety."

## Candidate Profile
The Technical Manager: You need 2-5 years of management experience. You cannot just be a senior researcher looking to switch tracks; they need someone who has already managed performance reviews, hiring loops, and team conflicts.

Domain Fluency: You don't need to be the world's leading expert on Sparse Autoencoders, but you need to understand them well enough to challenge a Research Lead or explain the implications to non-technical leadership.

The "Servant Leader": This is a support role. You succeed when your team publishes. If you have a massive ego and need your name first on every paper, this is likely a bad fit.

## Critical "Knockout" Criteria
Management Track Record: If you have never line-managed technical staff, you are better off applying to the Research Scientist or Engineer roles (as the JD suggests).

Location: They are strict on the San Francisco 3 days/week policy.

Literature Familiarity: The application asks specifically: "Why do you want to work on the Anthropic interpretability team?" and explicitly mentions they want "deeper and more specific engagement" than just general AI interest. You must reference their specific papers (e.g., Monosemanticity, Circuits).

[Expression of Interest] Research Manager, Interpretability: Anthropic

🧠 Related Jobs