AI Evaluation Specialist: Binance

Sep 7, 2025 | Location: Hong Kong or Taipei, Taiwan (Remote) | Deadline: Not specified

As an AI Evaluation Specialist, you'll be a key part of Binance's AI adoption journey. You will design and implement comprehensive evaluation frameworks that span the entire lifecycle of LLM agents, from pre-deployment testing to ongoing monitoring and refinement. The goal is to ensure the AI agents used in areas like Customer Service, Growth, and Compliance are reliable, accurate, and compliant.

Responsibilities
Participate in the entire software development lifecycle for AI agents.

Act as the go-to person for all matters related to AI agent evaluation and continuous monitoring.

Create test strategies and perform hands-on testing to ensure the accuracy and performance of AI and data applications.

Conduct root cause analysis for test failures and drive optimizations.

Design and develop internal tools to improve engineering and testing efficiency using AI technology.

Required Skills & Qualifications
Education: Bachelor's or Master's degree in Computer Science, AI, Data Science, or a related field.

AI Knowledge: A strong understanding of Large Language Models (LLMs), autonomous AI agents, and their system architectures.

Evaluation Methodologies: Experience with AI evaluation methods, including offline benchmarking, online monitoring, and human-AI evaluation.

Software Engineering: Familiarity with software engineering best practices like Test-Driven Development (TDD) and Behavior-Driven Development (BDD).

Analytical Skills: Strong analytical skills with a focus on data-driven diagnostics and root cause analysis.

Bonus: Experience with evaluation tools and frameworks such as Opik or LangSmith is a plus.

🧠 Related Jobs