Software Engineer for AI Inference Optimization

NVIDIA Gruppe • toronto, on, Canada • Posted June 10, 2026

Location toronto, on

Job Type Full-time

Category Other-General

Posted June 10, 2026

                Become a pivotal part of NVIDIA's team as a Senior Software Engineer specializing in AI inference optimization. Your skills in GPU kernel development and benchmarking will play a crucial role here.

This role demands seasoned software engineers dedicated to refining AI inference systems. You will actively participate in architecting and optimizing the vLLM inference framework, focusing on high-performance computing across GPU clusters. Your collaboration with various teams will help push the boundaries of accelerated computing.

Key Responsibilities: • Enhance vLLM's features to optimize new models • Benchmark and optimize GPU kernels using advanced methods • Create methodologies for industry-leading benchmarking tools • Design orchestration for large-scale inference deployments • Conduct original research for ML Systems advancements

Requirements: • PhD with top publications in ML Systems or relevant field • Expertise in programming with Python and C/C++ • Know...

Interested in this role?

Click the button below to start your application.

Apply Now