RemoteAtlas
Find Jobs
CompaniesBlogPost a Job
RemoteAtlas

Discover curated remote jobs and work from anywhere. Updated daily with roles from top companies worldwide.

Remote Jobs by Role

  • Remote Engineering Jobs
  • Remote Design Jobs
  • Remote Product Jobs
  • Remote Marketing Jobs
  • Remote Sales Jobs
  • Remote Data Jobs
  • Remote DevOps Jobs
  • Remote Support Jobs
  • Remote Customer Success Jobs
  • Remote Security Jobs
  • Remote Mobile Developer Jobs

More Roles

  • Remote QA Jobs
  • Remote HR & People Jobs
  • Remote Finance Jobs
  • Remote Operations Jobs
  • Remote Management Jobs
  • Remote AI & Machine Learning Jobs
  • Remote Writing & Content Jobs
  • Remote Video & Animation Jobs
  • Remote Translation & Localization Jobs
  • Remote IT Support Jobs
  • Remote Community Management Jobs

Remote Jobs by Location

  • Remote Jobs in the US
  • Remote Jobs in Europe
  • Remote Jobs — Work from Anywhere
  • Remote Jobs in the UK
  • Remote Jobs in the Americas
  • Remote Jobs in EMEA
  • Remote Jobs in APAC
  • Remote Jobs in Canada

Company

  • Browse All Jobs
  • Blog
  • Companies
  • About Us
  • Post a Job
  • Contact Us
© 2026 RemoteAtlas. All rights reserved.
Terms & ConditionsPrivacy Policy
Home/Remote Engineering Jobs/Redpanda/Staff Production Operations Engineer
R
Redpanda

Staff Production Operations Engineer

Redpanda

RemoteFull-time$211k - $256kPosted about 16 hours ago
Software Development

Summary

Redpanda is hiring a Staff Production Operations Engineer to join their Software Development team. This role combines hands-on site reliability engineering with planning and coordination skills to ensure a world-class operations practice across a globally distributed engineering team. Key skills: Go, AWS, GCP, Azure, AI.

About the role

About the Role:

We're looking for a Staff Production Operations Engineer to drive Redpanda's reliability operations program. This role combines hands-on site reliability engineering with planning and coordination skills to ensure a world-class operations practice across a globally distributed engineering team.

In this role, you'll work with the broader Engineering team, Engineering leadership, Product and Customer Success to drive operational excellence. You'll coordinate our on-call and incident lead rotations, drive blameless post-incident reviews, and own the processes that help us respond faster, learn more from outages, and systematically improve reliability. We're looking for someone who can leverage AI agents to automate the operational toil that slows teams down, building on Redpanda's own ADP platform to do it.

You Will:  

  • Drive process improvements across the incident lifecycle: severity models, triage enforcement, alert noise reduction, and follow-up completion rates

  • Coordinate the on-call program across multiple geographies: manage schedules and shadow rotations, onboard new engineers, and ensure consistent coverage

  • Select incidents for post-incident review, facilitate blameless post-incident reviews, document findings, and track follow-up completion. Contribute to addressing incident follow-ups where possible, either by fixing issues directly or prototyping solutions

  • Build AI agents to automate operational toil, including oncall automation, as well as incident summarization, post-incident reviews prep, follow-up tracking, and on-call analytics

  • Maintain runbooks, playbooks, and incident process documentation, and keep them current as processes evolve

You Have: 

  • 5+ years of experience in site reliability engineering, DevOps, or production operations in large-scale, highly reliable environments

  • A track record of leading initiatives end-to-end, from design and planning, to execution and production operation

  • Hands-on experience with incident management tooling (incident.io, PagerDuty, or similar) and observability stacks (Datadog, Grafana, Sentry, CloudWatch, or equivalent)

  • Strong Fluency with reliability concepts: MTTD, MTTR, MTTA, error budgets, SLOs

  • Experience building automation and tooling to reduce operational toil 

  • Proficiency in Go (or comparable systems language with willingness to ramp)

  • Experience with AI-assisted software development workflows including tools like Claude Code

  • Working knowledge of at least one of AWS / Azure / GCP, including infrastructure as code for system and network infrastructure

  • Strong written communication; ability to drive alignment across engineering teams without direct authority

Nice to Have: 

  • Hands-on experience building agents or automations using LLMs

  • Familiarity with Redpanda, Apache Kafka, or other streaming infrastructure

  • Prior experience in a fast-growing B2B infrastructure or developer tools company

 

U.S. base salary range for this role is $220,000 - $256,000 (CA, NY, WA) and  $211,000 - $250,000 (other US locations). Our salary ranges are determined by role, level, and location. We strive to consider each candidate's job-related skills, location, experience, relevant education or training to determine individual base salary. Your talent partner will share more about the specific salary range for your preferred location during the hiring process.

Please note that Redpanda uses artificial intelligence (AI) technology to assist in the screening and assessment of applications for this position. However, all final hiring decisions are made by our human hiring team.

Vacancy Status: This job posting is for an existing vacancy.

Related jobs

HS
Sr. Product EngineerNew

Help Scout·Remote — United States

Full-time$162K - $196KSoftware Development
15h
S
SentiLink
Senior Engineering Manager, Data Platform

SentiLink·Remote — United States

Full-time$200k+Software DevelopmentManagement
68d
B
Bankjoy
Senior Android EngineerNew

Bankjoy·Remote — Canada

Full-timeSoftware DevelopmentMobile Development
22h
R
Rescale
Senior Solutions ArchitectNew

Rescale·Remote (United States)

Full-time$140K - $180KSoftware Development
1d
More remote engineering jobs