Senior AI Research Engineer, Model Inference (Remote): Tether

Tether is seeking an experienced AI Model Engineer with deep expertise in kernel development, model optimization, fine-tuning, and GPU acceleration. The role involves extending the inference framework to support inference and fine-tuning for Language Models, with a strong focus on mobile and integrated GPU acceleration using Vulkan. The ideal candidate will be a hands-on engineer who will play a critical role in pushing the boundaries of desktop and on-device inference and fine-tuning performance for next-generation SLM/LLMs.

Important Notes:

The company warns against recruitment scams and advises candidates to apply only through official channels, verify recruiter identities, and never provide payment or financial details. All official communication will come from emails ending in @tether.to or @tether.io.

Responsibilities:

Implement and optimize custom inference and fine-tuning kernels for small and large language models.

Implement and optimize full and LoRA fine-tuning across multiple hardware backends.

Design and extend datatype and precision support (e.g., int, float, mixed precision).

Design, customize, and optimize Vulkan compute shaders for quantized operators.

Investigate and resolve GPU acceleration issues on Vulkan and integrated/mobile GPUs.

Architect support for advanced quantization techniques.

Debug and optimize various GPU operators.

Integrate and validate quantization workflows for training and inference.

Conduct evaluation and benchmarking of model performance.

Conduct GPU testing across desktop and mobile devices.

Collaborate with teams to prototype, benchmark, and scale new model optimization methods.

Deliver production-grade language model deployment for mobile and edge use cases.

Work with cross-functional teams to integrate optimized frameworks into production pipelines.

Requirements (Job requirements):

Proficiency in C++ and GPU kernel programming.

Proven expertise in GPU acceleration with the Vulkan framework.

Strong background in quantization and mixed-precision model optimization.

Experience in Vulkan compute shader development and customization.

Familiarity with LoRA fine-tuning and parameter-efficient training methods.

Ability to debug GPU-specific issues on desktop and mobile devices.

Hands-on experience with mobile GPU acceleration and model inference.

Familiarity with large language model architectures (e.g., Qwen, Gemma, LLaMA).

Experience implementing custom backward operators for fine-tuning.

Experience creating custom datasets for style transfer and domain-specific fine-tuning.

Demonstrated ability to apply empirical research to overcome model challenges.

Senior AI Research Engineer, Model Inference (Remote): Tether

🧠 Related Jobs