Research Engineer, Interpretability: Anthropic

This engineering role on the Interpretability team focuses on implementing and scaling the research efforts to mechanistically understand how trained large language models (LLMs) work. The ultimate goal is to enable safe and steerable AI systems by reverse-engineering the algorithms learned in the neural network weights.

This position is a close collaboration with researchers, focusing heavily on building the infrastructure and tools necessary for "big science" experiments.

Key Responsibilities
Implementation & Analysis: Implement and analyze research experiments, running them quickly in toy scenarios and efficiently at large scale in production models.

Infrastructure Optimization: Set up and optimize research workflows to run efficiently and reliably at large scale. This includes optimizing performance of large-scale distributed systems and ML training (including parallelizing to many GPUs).

Tool Development: Build tools and abstractions to support the rapid pace of research, such as developing tools for easy access to LLM internals (like Garcon) or creating interactive visualizations.

Safety Support: Develop and improve tools and infrastructure to support other Anthropic teams in using Interpretability work to improve overall model safety.

Minimum Qualifications
Experience: Requires 5–10+ years of experience building software.

Education: At least a Bachelor's degree in a related field or equivalent experience.

Technical Proficiency: Must be highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive specifically with Python.

Research Exposure: Some experience contributing to empirical AI research projects is necessary.

Mindset: Candidates should demonstrate a strong ability to prioritize and direct effort toward the most impactful work. They must be comfortable operating with ambiguity, questioning assumptions, and should prefer fast-moving collaborative projects over extensive solo efforts.

Preferred Experience (Strong Candidates)
Strong candidates will also have experience in specialized areas critical to scaling Anthropic's research:

Designing a clean codebase that allows researchers to quickly code experiments, launch them, and analyze results efficiently.

Optimizing the performance of large-scale distributed systems.

Collaborating closely with researchers to translate scientific goals into engineering solutions.

Experience with language modeling with transformers.

Direct experience working with GPUs or PyTorch.

Research Engineer, Interpretability: Anthropic

🧠 Related Jobs