Lead DevOps Engineer
Litmus›
📍Toronto, CA
Posted 5d ago · via ashby
Apply on ashby→Job Description
Who is Litmus
Litmus is a growth-stage software company that is transforming the way companies harness the power of machine data to improve operations. Our software is enabling the next wave of digital transformation for the biggest and most innovative companies in the World – making Industrial IoT, Industry 4.0 and Edge Computing a reality. We just completed our Series B financing round, and we are looking to expand our team.
Why join the Litmus team
You want to be a part of something great
We pride ourselves on building the most talented and experienced team in the industry who knows how to win. We work hard and the results speak for themselves. We’re trusted by industry leaders like Google, Dell, Intel, Mitsubishi, Hewlett-Packard Enterprise and others as we partner to help Fortune 500 companies digitally transform.
You want to define and shape the future
At Litmus you’ll have the opportunity to support and influence the next wave of the industrial revolution by democratizing industrial data. We’re leading the industry in edge computing to feed artificial intelligence, machine learning and other applications that rapidly change the way manufactures operate
You want to build and shape your career
Join a growth-stage Silicon Valley company to build and define your career path in an environment that allows you to progress rapidly. Bring your unique experience, talent and expertise and add to it by collaborating with and learning from the brightest people in the industry.
We are committed to hiring great people who are passionate about what they do and thrive on winning as a team. We welcome anyone and everyone who wishes to join the Litmus team to apply and share their career experience, dreams and goals with us.
About the Role
Litmus is building the industrial IoT platform of record, and our DevOps function is the engine that lets engineering move fast with confidence. This is a senior technical leadership role — reporting directly to the Head of Technology — for someone who is ready to own the DevOps function end-to-end across the entire company, and to lead its transformation into an AI-enabled engineering discipline.
You will inherit a capable, distributed team and a meaningful technical foundation: self-hosted GitLab for CI/CD, multi-cloud infrastructure across AWS and GCP, Kubernetes (EKS) workloads, and an on-premises VMware estate. Your mandate is to level up this foundation, drive down delivery friction for the broader engineering organization, and make strong technical decisions without needing direction for day-to-day operations.
If you thrive at the intersection of platform engineering, cloud infrastructure, and security automation — and you want to be the person who sets the standard — this role is for you.
What You’ll Own
Technical Leadership & Team
Lead and mentor a distributed DevOps team spanning North America and India, including an infrastructure security-focused sub-team.
Serve as the primary technical decision-maker for the DevOps function — architecture, tooling choices, prioritization, and delivery standards.
Partner with Engineering, QA, and Product leadership to reduce delivery friction and improve DORA metrics (lead time, deployment frequency, MTTR, change fail rate).
Represent the DevOps function at the leadership level, including communicating roadmap, risks, and platform health to the Head of Technology and broader Technology leadership.
CI/CD Platform (GitLab)
Own the self-hosted GitLab platform — upgrades, runner fleet management (VMware-hosted and cloud), and platform health.
Drive maturity of the CI/CD Catalog and shared template library (ci-common/gitlab-templates), ensuring teams can self-serve without bespoke pipeline configuration.
Evolve pipeline capabilities: container image scanning, IaC static analysis, SAST, SBOM/CVE generation, and MR-triggered security scans.
Establish and enforce merge request standards, branch protection policies, and CODEOWNERS governance across the GitLab organization.
Kubernetes & Cloud Infrastructure
Own EKS day-2 operations: cluster upgrades, node group management, networking (private API endpoints, Cloudflare tunnel integration), and reliability posture.
Manage multi-cloud infrastructure across AWS (primary) and GCP, including resource lifecycle, cloud cost optimization, and account governance.
Lead the rationalization of legacy infrastructure (on-prem Nexus, Concourse CI) and drive the migration to cloud-native equivalents where appropriate.
Maintain and improve the Terraform IaC estate, including drift detection, module governance, and GitLab CI-driven plan/apply workflows.
Security & Identity
Drive the rollout and stabilization of SSO federation across vCenter/VMware, AWS IAM Identity Center, and Azure AD groups.
Own the security tooling stack: Qualys vulnerability scanning, Defender alert triage, container scanning pipelines, and SBOM/CVE reporting for product releases.
Establish and enforce secrets management standards using 1Password across pipelines and infrastructure automation.
Ensure data security in transit and at rest as automation and self-service capabilities expand.
Observability & Platform Engineering
Build and own the internal developer platform vision — reducing cognitive load on engineers, QA, and program managers through self-service tooling and automation.
Lead the observability stack: Grafana (Helm-deployed on EKS), alerting pipelines, and infrastructure/application performance monitoring.
Drive a metrics-first culture for the DevOps function, using DORA metrics and custom platform health indicators to guide roadmap decisions.
Evaluate and recommend tooling investments that improve developer experience, pipeline performance, and release confidence.
AI-Enabled DevOps Transformation
Own and drive the AI transformation of the DevOps function — identifying where AI tooling can meaningfully reduce toil, accelerate delivery, and improve reliability across the engineering organization.
Integrate AI-assisted tooling into CI/CD pipelines: automated code review augmentation, AI-generated pipeline diagnostics, intelligent test selection, and anomaly detection in build and deployment workflows.
Embed AI capabilities into the observability and incident response stack — using LLM-assisted root cause analysis, alert summarization, and runbook generation to reduce mean time to resolution.
Champion AI coding tool adoption across the engineering team — evaluating, piloting, and governing tools (LLM-powered IDEs, AI pair programming, code generation) to maximize productivity while maintaining security and IP standards.
Apply AI-driven approaches to cloud cost optimization — using intelligent anomaly detection and spend forecasting to inform FinOps decisions across AWS and GCP.
Build a point of view on AI governance for the DevOps function — defining appropriate data handling boundaries, prompt security practices, and acceptable use policies as LLM tooling becomes embedded in engineering workflows.
Required Experience & Skills
5+ years of progressive DevOps/platform engineering experience, with at least 2 years in a technical lead or staff-level role.
Deep, hands-on experience with GitLab CI/CD:
Self-hosted GitLab administration (upgrades, runners, platform governance)
Building and maintaining shared CI/CD templates and catalogs
Pipeline security integrations (SAST, container scanning, IaC analysis)
Production Kubernetes experience (preferably EKS):
Cluster upgrades, node management, networking, and RBAC
Day-2 operations and reliability engineering
GitLab-driven deployment workflows
Multi-cloud infrastructure proficiency across AWS and at least one of GCP/Azure:
AWS IAM, Organizations, SSO/IAM Identity Center
VPC networking, EKS, ECR, and cloud cost optimization
Infrastructure as Code with Terraform:
Module design, remote state, drift detection
CI/CD-driven plan/apply pipelines
Identity and access management:
Azure AD / Microsoft Entra ID — SSO federation and group-based access
Experience federating VMware vCenter, AWS, or similar platforms with AD/LDAP
Security tooling experience: vulnerability scanning (Qualys or equivalent), secrets management (1Password, Vault, or equivalent), SBOM/CVE pipeline integration.
Fluency in at least one scripting language (Bash, Python, or similar) for automation and tooling.
Strong written and verbal communication — able to write clear design documents, drive technical alignment, and represent the team in cross-functional and leadership conversations.
Demonstrated experience using AI tooling in an engineering context — whether in pipelines, developer tooling, observability, or infrastructure automation — and a clear point of view on where it creates genuine leverage vs. hype.
Nice-to-Have Experience
Familiarity with Yocto/BitBake build systems and embedded Linux release pipelines.
Experience with Concourse CI or other pipeline orchestration systems in a migration context.
Cloudflare Zero Trust / WARP / Tunnel architecture.
Experience with DataHub, Grafana Loki, or similar observability/data catalog tooling.
Exposure to industrial IoT platforms, edge computing, or embedded Linux product delivery.
Experience managing GitLab at scale across 50+ repositories and multiple engineering teams.
Hands-on experience building AI-augmented DevOps workflows: LLM-powered runbook generation, AI-assisted incident triage, or natural language interfaces to infrastructure tooling.
Familiarity with MCP (Model Context Protocol) server integration or agentic AI tooling applied to developer workflows.
About Litmus
Litmus builds the industrial IoT platform that helps manufacturers connect, collect, and act on machine data at scale. Our engineering team ships across a complex, real-world stack — embedded Linux, cloud-native services, and everything in between. The DevOps function sits at the center of all of it.
We move fast, we care about craft, and we invest in the people and tools that let us do both sustainably.
Compensation
CA$145,000 – CA$185,000 base salary, commensurate with experience.
Total package includes benefits, equity participation, and professional development allowance.
Litmus is committed to building an inclusive team. We encourage applications from candidates of all backgrounds and will provide accommodation throughout the recruitment process upon request.
Details
- Department
- Engineering
- Work Type
- hybrid
- Locations
- Toronto, CA
- Posted
- April 9, 2026
- Source
- ashby