Senior AI Research Engineer, Model Inference: Tether

Tether is a global financial technology company pioneering a financial revolution through reserve-backed tokens and blockchain technology. Their solutions, including the world's most trusted stablecoin (USDT), empower businesses to seamlessly integrate digital tokens. Beyond finance, Tether has divisions focused on sustainable Bitcoin mining (Tether Power), AI and peer-to-peer technology (Tether Data), digital education (Tether Education), and future innovations (Tether Evolution). They operate as a lean, remote-first company with a global team.

About the Role:
Tether is seeking an experienced AI Model Engineer with deep expertise in kernel development, model optimization, fine-tuning, and GPU acceleration. The role involves extending their inference framework to support and fine-tune language models, with a strong focus on mobile and integrated GPU acceleration using Vulkan. The engineer will play a critical role in pushing the boundaries of on-device inference performance for next-generation SLM/LLMs.

Responsibilities:

Implement and optimize custom inference and fine-tuning kernels for language models across multiple hardware backends.

Design, customize, and optimize Vulkan compute shaders for quantized operators and fine-tuning workflows.

Architect and support advanced quantization techniques to improve efficiency and memory usage.

Debug and optimize GPU operators and resolve GPU acceleration issues on Vulkan and mobile GPUs.

Conduct evaluation and benchmarking of model performance.

Collaborate with research and engineering teams to prototype and scale new model optimization methods.

Deliver production-grade, efficient language model deployments for mobile and edge use cases.

Requirements:

Proficiency in C++ and GPU kernel programming.

Proven expertise in GPU acceleration with the Vulkan framework.

Strong background in quantization and mixed-precision model optimization.

Experience with Vulkan compute shader development and customization.

Familiarity with LoRA fine-tuning and parameter-efficient training methods.

Ability to debug GPU-specific performance and stability issues on desktop and mobile devices.

Familiarity with large language model architectures (e.g., Qwen, Gemma, LLaMA, Falcon).

Experience implementing custom backward operators for fine-tuning.

Senior AI Research Engineer, Model Inference: Tether

🧠 Related Jobs