Anthropic AI Safety Fellow, US: Anthropic

Aug 12, 2025 | Location: Remote-Friendly (Travel Required) | San Francisco, CA. | Deadline: Not specified

Anthropic's AI Safety Fellows Program is an external collaboration designed to accelerate progress in AI safety research. It offers promising talent a chance to gain hands-on research experience by bridging the gap between industry engineering expertise and the research skills needed for impactful work. Fellows will work on an empirical project aligned with Anthropic’s research priorities, with the goal of producing a public output like a paper submission. This program provides substantial support, including mentorship from Anthropic researchers, a weekly stipend, funding, and access to shared workspaces. The next cohort is scheduled to start in October 2025.

Key Responsibilities
Work on an empirical project aligned with Anthropic's AI safety research priorities.

Use external infrastructure, such as open-source models and public APIs, to conduct research.

Collaborate with an assigned mentor from Anthropic to guide the project.

Develop the skills necessary to contribute meaningfully to critical AI safety research.

Aim to produce a public output, such as a paper submission, by the end of the program.

Mentor & Research Areas
Fellows will be matched with mentors who lead projects in key AI safety research areas, including:

Scalable Oversight: Developing techniques to keep highly capable models helpful and honest.

Adversarial Robustness and AI Control: Creating methods to ensure advanced AI systems remain safe and harmless.

Model Internals / Mechanistic Interpretability: Advancing the understanding of how large language models work internally.

Required Qualifications
Motivation to reduce catastrophic risks from advanced AI systems.

Excitement about transitioning into full-time empirical AI safety research.

Strong technical background in computer science, mathematics, physics, or a related field.

Strong programming skills, particularly in Python and machine learning frameworks.

Ability to work full-time on the fellowship for at least 2 months.

Ability to obtain US work authorization (visa sponsorship is not available, but support for OPT/CPT on F-1 visas is offered).

Comfortable working in a fast-paced, collaborative environment and executing projects independently.

Interview Process
The interview process is on a rolling basis until the August 17 deadline and consists of the following stages:

Initial Application and References: Submit your application and provide references.

Technical Assessment: A 90-minute coding screen in Python.

Technical Interview: A 55-minute coding-based technical interview without any machine learning components.

Final Interviews: A 15-minute research discussion and a take-home research project (5-hour work period + 30-minute review).

Offer Decisions: Offers will be extended on a rolling basis, with a target date of early October.

Anthropic AI Safety Fellow, US: Anthropic

🧠 Related Jobs