Home » Jobs » Anthropic Ai Safety Fellow Anthropic

Anthropic AI Safety Fellow: Anthropic

Jan 24, 2026   |   Location: Hybrid (London, UK or Berkeley, CA) and remote   |   Deadline: Not specified

Experience: Entry

Salary: $3,850 USD / ยฃ2,310 GBP / $4,300 CAD per week

Role Overview
This is essentially a "Paid Audition" for the Major Leagues of AI Research. Anthropic (creators of Claude) is using this 4-month program to find hidden gemsโ€”researchers who might not have a PhD from Stanford but have the raw talent to solve "Alignment" problems (preventing AI from destroying humanity).

The Mission: You spend 4 months doing "Empirical Research." You aren't building product features; you are writing papers on topics like "How do we know if an AI is lying?" or "Can we mathematically prove this model won't output a bioweapon recipe?"

The Path: ~40% of Fellows get full-time offers at Anthropic. The rest usually land at other top labs (OpenAI, DeepMind) or top PhD programs.

Key Responsibilities
Independent Research: You pick a mentor (like Jan Leike or Nicholas Carlini, who are rockstars in this field) and a topic (e.g., Mechanistic Interpretability).

Coding & Experimentation: You will burn through ~$15k/month in compute credits running experiments on open-source models (Llama 3, Mistral) or public APIs.

Publishing: The goal is to produce a "Public Output" (usually an Arxiv paper) by the end of the 4 months.

Critical "Fine Print"
Start Date: July 2026. (Yes, they recruit 1.5 years in advance because the talent pool is that competitive).

Visa: No Sponsorship. You must already have the right to work in the US, UK, or Canada. This is a major blocker for international applicants.

Recruitment Partner: The application is handled by Constellation, a separate entity. Do not be alarmed if the email comes from them; it is legitimate.

Strategic Analysis: What they actually want
They list "No prior AI experience necessary," but this is misleading. While they don't require a PhD, they require "Research Taste" and "Engineering Velocity."

Research Taste: Can you ask the right questions? (e.g., "Instead of just training a model, how do we measure if it's deceived us?")

Engineering Velocity: Can you write a Python script in 2 hours that spins up 50 GPUs to test a hypothesis? (Proficiency in Python/PyTorch is non-negotiable).

Mentors & Topics (The "Cheat Sheet")
If you apply, you should mention one of these specific areas to show you understand their work:

Mechanistic Interpretability: "Reverse engineering" the neural network to find where "knowledge" is stored (Mentors: Trenton Bricken, Chris Olah's old team).

Scalable Oversight: Using AI to supervise other AI (because humans are too slow).

Model Organisms: Creating "toy models" that are intentionally misaligned to study them (like lab rats for AI safety)
๐Ÿš€ Apply Now

๐Ÿ‘€ 13 views   |   ๐Ÿš€ 0 clicks

๐Ÿง  Related Jobs