Home » Jobs » Applied Safety Research Engineer Safeguards At Anthropic

Applied Safety Research Engineer, Safeguards at Anthropic

Jan 24, 2026   |   Location: San Francisco, CA (Hybrid - 25% in-office requirement)   |   Deadline: Not specified

Experience: Mid

Continent: North America

Salary: $320,000 - $405,000 USD (Base Salary) + Equity

Role Overview
This role sits at the critical intersection of Software Engineering and Safety Research. While the "Research Scientists" are inventing new model architectures, your job is to build the "Testing Infrastructure" (Evaluations or "Evals") that proves whether those models are safe enough to launch.

You are answering the question: "How do we mathematically prove that Claude won't help a user build a biological weapon or execute a cyberattack?"

Key Responsibilities
Designing "Exams" for AI: You aren't just running existing tests; you are inventing new ways to test for complex harms. This involves creating "Representative Test Data"β€”likely using synthetic data generation to create thousands of tricky prompts.

Pipeline Plumber: You take a safety concept (e.g., "Don't allow prompt injection") and turn it into a scalable, automated pipeline that runs every time a new model checkpoint is trained.

Grading the Grader: You must validate that your automated grading systems (often other LLMs) are accurate. If the grader says "This response is safe," can you trust it?

Policy Translation: You work with non-technical Policy experts who define "Harm." You translate their definitions into Python code and executable tests.

Strategic Analysis: The "Safeguards" Team
The "Moat": In 2026, building a smart model is becoming a commodity. Building a safe model that enterprises trust is the competitive moat. This team is the gatekeeper of that trust.

The Engineering Heavy Lift: Unlike pure research roles, this job requires production-grade engineering. You need to deal with distributed systems, data processing at scale, and messy real-world data.

Visa Support: Unlike the "Fellowship" role, Anthropic will sponsor visas for this position, making it accessible to top global talent.

Candidate Profile: Who gets hired?
The ML Engineer with a Conscience: You have 4+ years of experience shipping ML products, but you care more about safety than capability.

The "Red Teamer": If you have experience with Jailbreaking, Adversarial Attacks, or Trust & Safety, you are a top-tier candidate.

The Rigorous Analyst: You understand statistics well enough to know when an improvement in a safety score is statistically significant versus just noise.

Critical Application Questions
The application form includes two specific "Knockout" questions you must nail:

"Do you have experience building model evaluations?"

"Please write a few sentences about your work contributing to model evals."

Tip: Do not just say "I ran the benchmarks." Describe how you designed a test for a specific failure mode (e.g., hallucination, bias) and how you measured the success.
πŸš€ Apply Now

πŸ‘€ 11 views   |   πŸš€ 0 clicks

🧠 Related Jobs