Hardware Health

Microsoft Corporation • Multiple Locations, United States, United States • Posted June 15, 2026

Location Multiple Locations, United States
Job Type Full-time
Category other-general
Posted June 15, 2026
**Overview**

Microsoft AI operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems. Our team is seeking a Member of Technical Staff, Hardware Health, to ensure these systems deliver sustained reliability, performance, and availability across exascale-class deployments.

We work closely with research, hardware, datacenter, and platform engineering teams to develop predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale.

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Join us and help shape the future of personal computing.

Microsoft’s ...

Interested in this role?

Click the button below to start your application.

Apply Now