AI Research Engineer - Pre-training: Tether (via Tether Data division)
Feb 6, 2026 |
Location: 100% Remote (Global) |
Deadline: Not specified
Experience: Mid
Salary: $300k - $600k+
This is not a "Wrapper" role. You are not building a chatbot using OpenAI's API. You are building the API. Tether recently invested hundreds of millions into Northern Data Group to secure 10,000+ NVIDIA H100 GPUs. They are pivoting from "just a stablecoin issuer" to a massive AI infrastructure player.
The Mission: Build a Sovereign/Uncensorable LLM. Tether's ethos is libertarian/privacy-focused (see their "Keet" P2P app). They likely want to train a Foundation Model that is independent of Big Tech (Google/Microsoft) control. You are training models from scratch (ab initio), which is significantly harder than fine-tuning.
## Key Responsibilities
The "GPU Shepherd": You will manage training runs on clusters of thousands of GPUs. This involves handling "loss spikes," hardware failures, and straggler nodes.
Architecture Design: You aren't just using Llama 3 architecture. You are exploring "non-transformer modifications" (perhaps Mamba/SSMs or Ring Attention) to improve efficiency.
Distributed Systems Engineering: You must implement 3D Parallelism (Data, Pipeline, and Tensor parallelism) to fit massive models into GPU memory. If you don't know how to optimize all-reduce operations, you will fail here.
## Strategic Analysis: The "Tether" Pivot
The "Why": Tether generates billions in profit from interest on US Treasury bills backing USDT. They have too much cash. They are converting that cash into Compute Power (GPUs) because they view Compute as the "Oil" of the next decade.
The Stack: They mention "Keet" and "Tether Data." This implies they want to run AI locally on devices or via P2P networks, reducing reliance on centralized cloud providers.
The Autonomy: Unlike Google DeepMind or OpenAI, where safety teams might throttle your research, Tether is likely to have a much more "accelerationist" (e/acc) culture.
## Candidate Profile: The "Full-Stack" Researcher
The Academic: A PhD is preferred because you need to understand the math behind why a model isn't converging.
The Plumber: You need to be comfortable with Megatron-LM, DeepSpeed, or FSDP. You know that pre-training is 10% math and 90% fighting with Linux drivers, InfiniBand networking, and CUDA kernels.
The Scaler: You have experience with "Scaling Laws." You know how to predict how a model will perform at 70B parameters based on a test run at 1B parameters.
## Critical "Knockout" Factors
Scale of Experience: If your biggest training run was on 8 GPUs, you are too junior. They need someone who has run jobs on 512+ or 1,000+ GPUs.
The "From Scratch" Factor: Have you only fine-tuned (PEFT/LoRA)? If so, you are likely not a fit. This role requires Pre-training experience (initializing random weights and teaching the model English/Code from zero).
HPC Knowledge: Understanding interconnects (NVLink/InfiniBand) and how to minimize communication overhead between nodes.
đ Apply Now
đ 31 views | đ 1 clicks