Lead AI Inference Engineer (Romania): Tether

This is not a standard "Train a massive model in the cloud" role. Tether is hiring for Edge AI / Local Inference.

You will lead a small "pod" (cross-functional team) to build the engine that runs Large Language Models (LLMs) locally on user devices (laptops, phones) rather than on massive server farms. This aligns with Tether's "Data" division and their flagship app Keet (a peer-to-peer, privacy-focused chat application). The goal is to deliver AI features where no data ever leaves the user's device.

Key Responsibilities
The Bridge: You sit between the low-level "metal" (C++ inference engines) and the high-level product (JavaScript/TypeScript apps).

Optimization: Your main technical challenge is making heavy AI models run smoothly on consumer hardware.

Leadership: Manage a pod of 3-5 engineers (mixing C++ and JS talent), handling architectural choices, code quality, and release cycles.

Integration: Deploy models using llama.cpp and ggml—the open-source standards for high-performance local inference.

Technical Stack (The "Edge" Stack)
Core Languages: C++ (Primary/Expert), JavaScript/TypeScript (Secondary).

Inference Engines: llama.cpp, ggml, ONNX.

Hardware Acceleration: Vulkan, CUDA, TVM, MLC.

Architecture: Transformers, LLMs, and P2P (Peer-to-Peer) networking principles.

Strategic Context: Why this role exists
Tether is moving beyond stablecoins into "Freedom Tech."

Privacy: By running AI locally (Edge AI), Tether removes the need to send user data to centralized servers (like OpenAI or Google). This fits the ethos of their P2P app, Keet.

Cost: Running inference on the user's device reduces infrastructure costs to near zero.

Independence: Tether is building a stack that is resilient to censorship and independent of Big Tech APIs.

Candidate Profile
The Hacker-Engineer: You likely spend time on Hugging Face or Reddit’s /r/LocalLLaMA. You know what Quantization (4-bit, 8-bit) is and why it matters for memory management.

The Leader: You can guide a small team through the chaos of rapid R&D.

The Polyglot: You are comfortable writing high-performance C++ in the morning and reviewing React/TypeScript PRs in the afternoon.

Lead AI Inference Engineer (Romania): Tether

🧠 Related Jobs