Sr. Engineering Manager, AI Evaluation Platform: Apple (Services Engineering)

Dec 11, 2025 | Location: Seattle, Washington, United States | Deadline: Not specified

This is a senior leadership role within Apple Services Engineering focused on building the next generation of AI evaluation systems. You will build and lead a new engineering team responsible for democratizing AI evaluation across the organization. The goal is to architect high-availability services and internal tools that enable self-service evaluation at scale for Generative AI and Agentic systems.

Key Responsibilities
Team Building: Hire, mentor, and grow a diverse team of backend and platform engineers from the ground up.

Technical Strategy: Own the roadmap for the core evaluation engine. Architect APIs, SDKs, and distributed services that turn complex metrics into simple, self-service calls.

Operationalizing Science: Partner with Applied Scientists to translate novel metrics and scoring algorithms into scalable, production-grade services.

System Integration: Serve as a technical bridge between research and the broader engineering ecosystem.

Engineering Rigor: Establish the SDLC standards for code quality, automated testing (CI/CD), and monitoring.

Qualifications
Minimum Qualifications
Management Experience: 5+ years of direct engineering management experience with a track record of hiring and retaining high-performing engineers.

Technical Experience: 7+ years of hands-on software engineering experience with deep proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas).

Research Collaboration: Demonstrated experience partnering with Applied Scientists or Researchers to operationalize scientific code.

AI Literacy: Functional literacy in AI/ML concepts (datasets, training vs. inference, evaluation metrics).

Infrastructure: Strong expertise in API Design, internal tools, and operational excellence (CI/CD, Docker/Kubernetes, Datadog/Prometheus).

Preferred Qualifications
MLOps Experience: Experience building foundational AI infrastructure (model registries, feature stores) using tools like Kubernetes, Ray, or Kubeflow.

Evaluation Frameworks: Deep familiarity with modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith.

GenAI Knowledge: Understanding of challenges related to LLMs and Agents (token economics, rate limits, multi-step reasoning evaluation).

Startup Mindset: Experience incubating new teams or thriving in high-ambiguity environments.

Sr. Engineering Manager, AI Evaluation Platform: Apple (Services Engineering)

🧠 Related Jobs