ML Research Platform Engineer (Distributed Training & HPC)

QNT Partners • singapore, singapore, Singapore • Posted June 10, 2026

Location singapore, singapore

Job Type Full-time

Category Other-General

Posted June 10, 2026

Location: Singapore, Hong Kong or Shanghai 

About the role  
We are looking for a platform engineer to build the infrastructure that powers our next-generation machine learning research. Think: large-scale experimentation, distributed training, and reproducibility. 

This is not  an applied ML role. You will not be fine-tuning LLMs or building agents. Instead, you will build the systems that enable researchers to train models at scale 

What you will own  
Distributed training pipelines  for GPU-accelerated workloads  (PyTorch, JAX) 
Experiment management  and model versioning  
Resource scheduling  on on-premise HPC clusters  and cloud (Slurm, Kubernetes) 
Observability  and debugging for complex training jobs 
Data lineage  and artifact tracking  

Must haves (non-negotiable) ...
            

Interested in this role?

Click the button below to start your application.
Apply Now