RemoteAtlas
Find Jobs
CompaniesBlogPost a Job
RemoteAtlas

Discover curated remote jobs and work from anywhere. Updated daily with roles from top companies worldwide.

Remote Jobs by Role

  • Remote Engineering Jobs
  • Remote Design Jobs
  • Remote Product Manager Jobs
  • Remote Marketing Jobs
  • Remote Sales Jobs
  • Remote Data Jobs
  • Remote DevOps Jobs
  • Remote Support Jobs
  • Remote Customer Success Jobs
  • Remote Cybersecurity Jobs
  • Remote Mobile Developer Jobs

More Roles

  • Remote QA Jobs
  • Remote HR & People Jobs
  • Remote Finance Jobs
  • Remote Operations Jobs
  • Remote Management Jobs
  • Remote AI & Machine Learning Jobs
  • Remote Writing & Content Jobs
  • Remote Video & Animation Jobs
  • Remote Translation & Localization Jobs
  • Remote IT Support Jobs
  • Remote Community Management Jobs

Remote Jobs by Location

  • Remote Jobs in the US
  • Remote Jobs in Europe
  • Remote Jobs - Work from Anywhere
  • Remote Jobs in the UK
  • Remote Jobs in the Americas
  • Remote Jobs in EMEA
  • Remote Jobs in APAC
  • Remote Jobs in Canada

Company

  • Browse All Jobs
  • Blog
  • Companies
  • About Us
  • Post a Job
  • Contact Us
© 2026 RemoteAtlas. All rights reserved.
Terms & ConditionsPrivacy Policy
Home/Remote Engineering Jobs/FluidStack/Principal Operations Engineer, Reliability — Data Center Operations
F
FluidStack

Principal Operations Engineer, Reliability — Data Center Operations

FluidStack

U.S. RemoteFull-time$150K - $250KPosted about 22 hours ago
Software Development

Summary

FluidStack is hiring a Principal Operations Engineer, Reliability — Data Center Operations to join their Software Development team. For most of human history, you farmed or you starved. Key skills: AI.

About the role

About Fluidstack

We exist to make humanity more free. For most of human history, you farmed or you starved. Technology gave people more time for the things they wanted to do, instead of things they had to do. Powerful AI will be the biggest lever for human choice we've ever built - but only if models are aligned with what humanity actually wants. There are groups building AI who don't share these goals. Whoever deploys frontier compute infrastructure fastest will decide whether AI expands human freedom or shrinks it.

We're singularly focused on delivering 10 to 100s of GWs of compute faster than anyone else, rethinking every layer of the stack. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale are our key differentiators. Come be a part of building civilization-scale infrastructure for AI.


We hire people who care deeply about this problem space. If that is you, please apply!

How We Operate

  • Extreme ownership. Full autonomy. Own things end to end often taking on scope outside your core role without being asked to get things done.

  • Velocity. We drive everything forward as fast as possible.

  • First principles. Challenge every assumption. Zero analogy thinking, no egos, the best idea wins.

  • Love of the game. The frontier of AI is the most interesting problem of our time. We put in long hours at high intensity to push the frontier forward.

The Data Center Operations Team

Examples of key problems the team is working on

  • Operate at the scale of a nation, not a building. Accelerating toward 100 GW by the end of the decade, roughly the entire electricity consumption of Japan. You won't just run a data center; you'll run infrastructure the size of a G7.

  • Fly the plane while it's being built. Running flawless operations inside a live construction zone, adapting in real time and turning the pace of build-out into our advantage. You will redefine operational excellence.

  • Write the playbook, don't inherit it. Most operators step into someone else's system. Here you build one, leaving your fingerprints on how the whole company runs. As we scale 100x, you'll set the standards, shape the operating model, and grow the team by the thousands.

Role Scope

  • Take the on-call escalation when a site hits trouble and triage it virtually, using real knowledge of the team and the systems to decide what to escalate, when, and how to keep the field crew focused without burying them.

  • Get on a plane when it matters: travel site to site (50%+) to work live incidents and post-incident reviews on the floor, and bring the practices that worked elsewhere with you.

  • Own root cause analysis on significant events through to closure and track corrective actions to done, killing the underlying class of failure rather than the one instance in front of you.

  • Read the patterns across the fleet’s incidents and RCAs, push the few highest-value learnings through to closure, and stay honest about what’s achievable and what to drop instead of boiling the ocean.

  • Carry learnings and practices from one campus to the next so a fix at one site becomes the standard everywhere before the failure repeats.

  • Write the operational Assessment standard and audit each campus against it, feeding what you find straight back into the corrective-action loop.

What We're Looking For

The below is a starting point. We always make space for exceptional people, so if you don't fit this role exactly, tell us where you would.

  • You’ve run a live critical operation and led a team of operators, and you carry the deep, earned judgment that comes from owning the floor when it counts.

  • You’ve been the person a site calls when something breaks, triaged the problem over the phone, and known exactly when to escalate and when to let the field team work it.

  • You’ve authored root cause analyses on significant events and tracked corrective actions to closure, and you can show the difference between an RCA that closed a ticket and one that killed a class of failure.

  • You’ve sat with a pile of RCA actions and cut it to the few that matter, because you know an operation that commits to everything finishes nothing.

  • You’ve traveled site to site, walked the floor, and left each operation better than you found it, carrying the practices that worked from one into the next.

  • You’ve written the standard, not just followed it, audited real sites against it without flinching from what you found, and can hold one bar across domains you don’t all live in.

  • Building an assessment, audit, qualification, or training program from scratch.

  • Bonus: Hyperscale or large colocation at hundreds of MW+. Direct exposure to Hardware or

    Network operations, not only Facilities, incident.io or equivalent incident tooling, plus DCIM.

Salary & Benefits

  • Competitive total compensation package (salary + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

    The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email, please email careers@fluidstack.io with your resume/CV, the role you've applied for, and the date you submitted your application-- someone from our recruiting team will be in touch.

Related jobs

RU
Customer Engineer

Rally UXR·Remote (Worldwide)

Full-time$160K - $180KSoftware Development
2mo
RU
Senior Infrastructure Engineer

Rally UXR·Remote US

Full-time$190K - $225KSoftware DevelopmentDevOps & Infrastructure
27d
RU
Senior Platform Engineer

Rally UXR·Remote US

Full-time$185K - $210KSoftware DevelopmentDevOps & Infrastructure
27d
WA
Technical Support Engineer

Witness AI·Remote

Contract$84.5k - $100kSoftware DevelopmentCustomer Support
14d
More remote engineering jobs