Site Reliability Engineer

RCS TECH • mexico, mexico, Mexico • Posted June 04, 2026

Location mexico, mexico

Job Type Full-time

Category IT / Computing / Software

Posted June 04, 2026

What You’ll Do  
 Reliability & Operations 
 - Own availability, latency, and scalability across SaaS and AI systems  
 - Define and enforce SLOs, SLIs, and error budgets 
 - Participate in a global on-call rotation (~1 week every 4 weeks) 
 - Lead incident response and drive blameless postmortems with systemic fixes 
 Platform & Infrastructure  
 - Architect and operate on-premise and multi-region, multi-cloud environments 
 - Manage large-scale Kubernetes workloads 
 - Build and evolve infrastructure using Terraform and Ansible 
 - Improve system resilience, fault isolation, and capacity planning 
 AI/ML & Automation  
 - Build and scale agentic AI systems for triage, anomaly detection, and self-healing 
 - Ensure reliability of model serving infrastructure 
 - Operate, optimize and scale distributed systems 
 What You Bring ...
            

Interested in this role?

Click the button below to start your application.

Apply Now