Site Reliability Engineer (SRE)
Descrição da Vaga
**Why join us?****Handoff is the AI agent that runs a construction company.** We help remodelers automate estimating, streamline operations, and win more work \- backed by real\-time cost data, intuitive design, and workflows that “speak contractor.” With over 10,000 monthly active users and $6B in annualized project volume already flowing through our platform, we’re becoming the trusted partner for the people who build our homes. We are backed by $25M\+ raised from Y Combinator, Initialized, and Greycroft. Our team is distributed across hubs in Austin, São Paulo, and Buenos Aires, and we are deeply focused on building intuitive, high\-impact solutions that make a real difference for our users. **Site Reliability** **Engineer at Handoff** You will own and elevate the reliability, scalability, and observability of Handoff’s platform. This is a hands\-on role focused on preventing incidents, improving system resilience, and enabling fast, safe product development. You’ll work closely with Backend, Fullstack, Data, and AI engineers to ensure our systems are production\-ready, observable, and built to scale, while keeping a strong focus on user impact and developer velocity. This is not a pure ops role. We’re looking for someone who thinks like an engineer, codes regularly, and partners deeply with product and engineering teams. ### **What you'll do** * Define and implement SLIs, SLOs, and error budgets for critical services, making reliability visible and measurable across the org. * Design, build, and maintain monitoring, logging, and alerting systems that surface real issues without unnecessary noise. * Lead and participate in incident response, owning detection, coordination, communication, and resolution. * Partner with engineers early in the design phase to improve reliability, scalability, and production readiness. * Improve CI/CD pipelines, deployment strategies, and rollback mechanisms to enable fast and safe releases. * Automate operational tasks and reduce toil through tooling, scripting, and infrastructure\-as\-code. * Ensure backup, disaster recovery, and failover strategies are defined, documented, and tested. * Monitor infrastructure performance and costs, proposing optimizations that balance reliability, speed, and efficiency. * Create and maintain runbooks, incident procedures, and reliability documentation to support team scale. ### **About You** * Strong experience as an SRE, Platform Engineer, DevOps Engineer, or similar reliability\-focused role. * Solid understanding of reliability fundamentals, availability, latency, error rates, throughput, durability. * Hands\-on experience with cloud platforms like AWS, GCP, or Azure. * Deep familiarity with observability tools such as Prometheus, DataDog, Grafana, OpenTelemetry, or similar. * Strong debugging skills and comfort working in high\-pressure production incidents. * Experience improving CI/CD pipelines and release safety. * Ability to write production\-quality code or scripts in languages like Python, Go, or Bash. * Experience with infrastructure\-as\-code and automation. * A pragmatic mindset that balances reliability with product velocity and real\-world constraints. * Strong communication skills and comfort collaborating across engineering, product, and leadership. * Comfortable in a fast\-paced environment, you’re quick to adapt to changing priorities and balance rapid iteration with high\-quality outputs. ### **What we offer** * \\uD83D\\uDCB8 Competitive **salary in USD** * \\uD83D\\uDCB0 Attractive **stock options** * \\uD83C\\uDF34 **Unlimited PTO** * \\uD83D\\uDE9B Relocation **allowance** * \\uD83D\\uDC68\\uD83D\\uDCBB **Top\-notch** equipment * \\uD83E\\uDDF3 **Team offsites around the world** \- we've already been to more than 5 countries! If you’re motivated by owning reliability end\-to\-end and shaping infrastructure decisions that impact customers every day, we’d love to hear from you.**Please note that we will only consider applications submitted in English.**
Vaga originalmente publicada em: indeed
💼 Encontre as melhores oportunidades para desenvolvedores no Job For Dev