Site Reliability Engineer, Observability
Location
toronto, on
Job Type
Full-time
Category
Other-General
Posted
June 06, 2026
Role Overview
This role is eligible for our hybrid work model: Two days in-office. As a Site Reliability Engineer – Observability, you will play a key part in maturing our observability capabilities by standardizing instrumentation, improving telemetry quality, and enabling faster root cause analysis that directly impacts MTTR and MTTD. Responsibilities
Support and evolve end-to-end observability solutions for collecting, shipping, storing, and querying OpenTelemetry signals (metrics, logs, and traces) across infrastructure, containers, and Kubernetes environments. Administer and operate core observability platforms (Splunk, New Relic, ClickHouse, Grafana, Lightrun), including service onboarding, access management, configuration, upgrades, and ongoing platform health. Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem that supports multiple telemetry types at scale. Improve and standardize instrumentation practices across service...
This role is eligible for our hybrid work model: Two days in-office. As a Site Reliability Engineer – Observability, you will play a key part in maturing our observability capabilities by standardizing instrumentation, improving telemetry quality, and enabling faster root cause analysis that directly impacts MTTR and MTTD. Responsibilities
Support and evolve end-to-end observability solutions for collecting, shipping, storing, and querying OpenTelemetry signals (metrics, logs, and traces) across infrastructure, containers, and Kubernetes environments. Administer and operate core observability platforms (Splunk, New Relic, ClickHouse, Grafana, Lightrun), including service onboarding, access management, configuration, upgrades, and ongoing platform health. Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem that supports multiple telemetry types at scale. Improve and standardize instrumentation practices across service...