Lead AI Inference Engineer (Netherlands): Tether

Jan 16, 2026 | Location: Netherlands (Remote - Distributed Team) | Deadline: Not specified

This is a highly specialized engineering leadership role focused on Edge AI and Local Inference. You will not be training models to run on massive cloud server farms; instead, you will architect the systems that allow Large Language Models (LLMs) to run efficiently on consumer devices (laptops and smartphones).

You will lead a "pod"—a small, cross-functional team of 3-5 engineers—bridging the gap between low-level performance code (C++) and application-level code (JavaScript). Your primary mission is to enable AI features within Tether’s peer-to-peer applications (like Keet) where data privacy is paramount and no user data leaves the device.

Key Responsibilities
Inference Engine Optimization: Deploy machine learning models to edge devices using high-performance frameworks like llama.cpp and ggml.

Cross-Functional Leadership: Manage a diverse pod containing Middleware Engineers (JS/TS), Foundation Engineers (C++), and QA.

Bridge Building: Create the architecture that allows a high-level application (likely Electron or React-based) to communicate with a low-level C++ inference engine without crashing the user interface.

Hardware Acceleration: Ensure models run smoothly by leveraging specific hardware capabilities (e.g., Apple Metal, Nvidia CUDA, or Vulkan) via the inference engine.

Technical Stack & Requirements
Core Languages: C++ (Expert level required for the engine) and JavaScript/TypeScript (for the application layer).

Inference Libraries: Deep experience with llama.cpp, ggml, and ONNX is mandatory. These are the current standards for running quantized LLMs on consumer hardware.

AI Architecture: Understanding of Transformers, LLMs, and Deep Learning concepts.

Hardware APIs (Bonus): Experience with Vulkan or CUDA to optimize matrix multiplications on GPUs.

Strategic Context: Why this role exists
Tether is expanding beyond stablecoins into "Tether Data" and peer-to-peer (P2P) technology.

Privacy-First AI: Traditional AI relies on sending user data to a central server (like OpenAI). Tether’s P2P ethos requires AI to run locally so user data remains private.

Cost Efficiency: "Edge Inference" offloads the compute cost from the company to the user's device, allowing for scalable, free-to-use AI tools without massive infrastructure bills.

Location Context: The Netherlands
While Tether is a remote-first global company, hiring in the Netherlands often targets specific talent pools:

Embedded & Systems Talent: The Netherlands (specifically regions like Eindhoven) is a global hub for high-performance C++ and embedded systems engineering, making it an ideal market for this specific skillset.

30% Ruling: If you are an expat relocating to the Netherlands for this role, the high salary bracket likely qualifies you for the "30% ruling," a significant tax advantage for skilled migrants.

Lead AI Inference Engineer (Netherlands): Tether

🧠 Related Jobs