Chief Site Reliability Engineer

EPAM Systems
Gerente
Remoto 🌐
Publicado em 20 de novembro de 2025

Descrição da Vaga

We are looking for a **Chief Site Reliability Engineer** to oversee the maintenance and enhancement of enterprise applications and their infrastructure through advanced DevOps methodologies. In this role, you will lead the application of CI/CD pipelines, infrastructure automation with Terraform, and Kubernetes cluster management to ensure robust and secure cloud environments. Join us to drive reliability and operational excellence at scale. **Responsibilities** * Maintain and improve enterprise applications and infrastructure by applying DevOps best practices * Design and manage CI/CD pipelines to enable continuous software deployment * Automate infrastructure provisioning and management via Terraform * Administer Kubernetes clusters with a focus on security and high availability * Monitor system health and performance, implementing enhancements to boost reliability * Collaborate with software development teams to refine deployment and automation workflows * Address operational requests and coordinate maintenance activities to uphold system stability * Enforce security standards in infrastructure and application deployment processes * Resolve complex issues related to cloud infrastructure and application performance * Work with cross\-functional teams to facilitate enterprise\-scale software releases * Document system configurations, operational procedures, and troubleshooting instructions * Assess and integrate new tools and technologies to optimize infrastructure operations **Requirements** * Advanced proficiency in Python with at least 7 years of experience * Proven expertise with Amazon Web Services and Microsoft Azure, including API, authentication, and serverless components * Comprehensive knowledge of cloud networking, Kubernetes cluster management, security, IAM, and automation * Strong grasp of CI/CD concepts, source control, containerization, and infrastructure as code using Terraform * Experience enabling and enhancing Infrastructure as a Service (IaaS) solutions * Background in enterprise\-level software development and release management * In\-depth understanding of automation principles related to CI/CD and IaaS * Exceptional analytical and problem\-solving skills for complex issues * Ability to manage operational requests and maintenance incidents effectively * Proficient English communication skills, both written and verbal (B2\+)

Vaga originalmente publicada em: indeed

💼 Encontre as melhores oportunidades para desenvolvedores no Job For Dev