Research Engineer, Interpretability at Anthropic

This role is on Anthropic's Interpretability team, which is dedicated to reverse-engineering how trained neural networks work. The team believes that developing a mechanistic understanding of these models is the most robust way to make advanced AI systems safe. As a Research Engineer, you will join this effort, treating neural networks like complex programs to be reverse-engineered, and building the "microscopes" needed to do so. The team's work is foundational to making AI models like Claude safer and more trustworthy.

Responsibilities
Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models.

Set up and optimize research workflows to run efficiently and reliably at a large scale.

Build tools and abstractions to support a rapid pace of research experimentation.

Develop and improve tools and infrastructure to help other teams use the Interpretability team's work to improve model safety.

Requirements
Required Experience:

5-10+ years of experience building software.

Highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with Python.

Some experience contributing to empirical AI research projects.

A strong ability to prioritize and direct effort toward the most impactful work.

A preference for fast-moving, collaborative projects over extensive solo efforts.

Strong candidates may also have experience with:

Designing a codebase that allows for rapid experimentation and analysis.

Optimizing the performance of large-scale distributed systems.

Collaborating closely with researchers.

Language modeling with transformers.

GPUs or PyTorch.

Research Engineer, Interpretability at Anthropic

🧠 Related Jobs