Lead AI Inference Engineer (Netherlands): Tether
Jan 16, 2026 |
Location: Netherlands (Remote - Distributed Team) |
Deadline: Not specified
Experience: Mid
Continent: Europe
This is a highly specialized engineering leadership role focused on Edge AI and Local Inference. You will not be training models to run on massive cloud server farms; instead, you will architect the systems that allow Large Language Models (LLMs) to run efficiently on consumer devices (laptops and smartphones).
You will lead a "pod"—a small, cross-functional team of 3-5 engineers—bridging the gap between low-level performance code (C++) and application-level code (JavaScript). Your primary mission is to enable AI features within Tether’s peer-to-peer applications (like Keet) where data privacy is paramount and no user data leaves the device.
Key Responsibilities
Inference Engine Optimization: Deploy machine learning models to edge devices using high-performance frameworks like llama.cpp and ggml.
Cross-Functional Leadership: Manage a diverse pod containing Middleware Engineers (JS/TS), Foundation Engineers (C++), and QA.
Bridge Building: Create the architecture that allows a high-level application (likely Electron or React-based) to communicate with a low-level C++ inference engine without crashing the user interface.
Hardware Acceleration: Ensure models run smoothly by leveraging specific hardware capabilities (e.g., Apple Metal, Nvidia CUDA, or Vulkan) via the inference engine.
Technical Stack & Requirements
Core Languages: C++ (Expert level required for the engine) and JavaScript/TypeScript (for the application layer).
Inference Libraries: Deep experience with llama.cpp, ggml, and ONNX is mandatory. These are the current standards for running quantized LLMs on consumer hardware.
AI Architecture: Understanding of Transformers, LLMs, and Deep Learning concepts.
Hardware APIs (Bonus): Experience with Vulkan or CUDA to optimize matrix multiplications on GPUs.
Strategic Context: Why this role exists
Tether is expanding beyond stablecoins into "Tether Data" and peer-to-peer (P2P) technology.
Privacy-First AI: Traditional AI relies on sending user data to a central server (like OpenAI). Tether’s P2P ethos requires AI to run locally so user data remains private.
Cost Efficiency: "Edge Inference" offloads the compute cost from the company to the user's device, allowing for scalable, free-to-use AI tools without massive infrastructure bills.
Location Context: The Netherlands
While Tether is a remote-first global company, hiring in the Netherlands often targets specific talent pools:
Embedded & Systems Talent: The Netherlands (specifically regions like Eindhoven) is a global hub for high-performance C++ and embedded systems engineering, making it an ideal market for this specific skillset.
30% Ruling: If you are an expat relocating to the Netherlands for this role, the high salary bracket likely qualifies you for the "30% ruling," a significant tax advantage for skilled migrants.
🚀 Apply Now
👀 16 views | 🚀 1 clicks