Sr. Engineering Manager, AI Evaluation Platform: Apple (Services Engineering)
Dec 11, 2025 |
Location: Seattle, Washington, United States |
Deadline: Not specified
Experience: Senior
Continent: North America
Salary: $216,600 to $325,500.
This is a senior leadership role within Apple Services Engineering focused on building the next generation of AI evaluation systems. You will build and lead a new engineering team responsible for democratizing AI evaluation across the organization. The goal is to architect high-availability services and internal tools that enable self-service evaluation at scale for Generative AI and Agentic systems.
Key Responsibilities
Team Building: Hire, mentor, and grow a diverse team of backend and platform engineers from the ground up.
Technical Strategy: Own the roadmap for the core evaluation engine. Architect APIs, SDKs, and distributed services that turn complex metrics into simple, self-service calls.
Operationalizing Science: Partner with Applied Scientists to translate novel metrics and scoring algorithms into scalable, production-grade services.
System Integration: Serve as a technical bridge between research and the broader engineering ecosystem.
Engineering Rigor: Establish the SDLC standards for code quality, automated testing (CI/CD), and monitoring.
Qualifications
Minimum Qualifications
Management Experience: 5+ years of direct engineering management experience with a track record of hiring and retaining high-performing engineers.
Technical Experience: 7+ years of hands-on software engineering experience with deep proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas).
Research Collaboration: Demonstrated experience partnering with Applied Scientists or Researchers to operationalize scientific code.
AI Literacy: Functional literacy in AI/ML concepts (datasets, training vs. inference, evaluation metrics).
Infrastructure: Strong expertise in API Design, internal tools, and operational excellence (CI/CD, Docker/Kubernetes, Datadog/Prometheus).
Preferred Qualifications
MLOps Experience: Experience building foundational AI infrastructure (model registries, feature stores) using tools like Kubernetes, Ray, or Kubeflow.
Evaluation Frameworks: Deep familiarity with modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith.
GenAI Knowledge: Understanding of challenges related to LLMs and Agents (token economics, rate limits, multi-step reasoning evaluation).
Startup Mindset: Experience incubating new teams or thriving in high-ambiguity environments.
đ Apply Now
đ 5 views | đ 0 clicks