Home » Jobs » Ai Network System Architect Nvidia

AI Network System Architect: NVIDIA

Aug 3, 2025   |   Location: Yokneam, Israel; Tel Aviv, Israel; Raanana, Israel   |   Deadline: Not specified

Experience: Mid

Continent: Asia

NVIDIA is seeking a highly motivated Senior AI Network System Architect to join their team of experts. This role is pivotal in shaping the future of high-performance computing, particularly in Machine Learning (ML) and Artificial Intelligence (AI). The architect will work on next-generation InfiniBand, NVLink, and Ethernet systems, which are essential for connecting and powering the world's most advanced AI clusters. This is an opportunity to work with cutting-edge technology and drive innovation in networking solutions that will be utilized by top researchers and engineers worldwide.

Responsibilities
As an AI Network System Architect, you will be doing the following:

Investigating emerging technologies and methodologies in ML and AI to understand their interactions with network infrastructure.

Executing workloads on AI systems, conducting profiling, and analyzing bottlenecks and potential enhancements.

Conducting research and implementing optimizations for communication libraries such as NCCL and UCX.

Spearheading the conceptualization of next-generation networking products specifically designed to support and accelerate state-of-the-art ML workloads.

Developing models for simulations, analyzing simulation results, and developing optimization algorithms.

Collaborating with multi-functional teams, including other architecture teams, logic design, system software, firmware, and ML research teams, to ensure successful project execution.

Requirements
Education: M.Sc. or Ph.D. degree in Computer Science, Computer Engineering, or Electrical Engineering.

Experience: At least 2+ years of industry or research experience in computer networks.

ML/AI Workloads: Extensive expertise in ML/AI workloads, particularly in distributed training.

Network Understanding: Excellent understanding of large-scale network behavior and the effect of distributed computing workloads on the network.

Simulation: Experience in the development of simulation environments.

Skills: Great problem-solving and critical-thinking skills.

Work Environment: Ability to thrive in a fast-paced and dynamic environment, and to work concurrently with multiple groups in the organization.

Preferred Skills
To stand out, candidates should possess:

Knowledge of communication libraries such as NCCL, UCX, and UCC.

Good knowledge of network protocols – such as InfiniBand, IP, TCP, RoCE, and network topologies.

Experience with Python, C++, and Docker.

Expertise in system engineering, operations research, and intricate hardware-software integrated systems.

Demonstrated experience in DLRM (Deep Learning Recommendation Model), LLM (Large Language Model), or other generative AI.
🚀 Apply Now

👀 15 views   |   🚀 0 clicks

🧠 Related Jobs