Chief Site Reliability Engineer

EPAM Systems
Gerente
Remoto 🌐
Publicado em 12 de novembro de 2025

Descrição da Vaga

Become a key member of our Enterprise Technology team as a **Chief Site Reliability Engineer**, overseeing critical infrastructure and enterprise applications. You will leverage your expertise in Site Reliability Engineering, CI/CD, cloud platforms, Kubernetes, and security to build resilient and scalable systems. If you are driven to lead innovation and maintain high availability, we invite you to join us. **Responsibilities** * Oversee and enhance enterprise application infrastructure through advanced DevOps strategies * Design and manage CI/CD pipelines to facilitate efficient and dependable software delivery * Administer and upgrade Kubernetes clusters ensuring scalability and robust security * Create and maintain automation tools and scripts primarily in Python * Direct cloud infrastructure operations on Amazon Web Services and Microsoft Azure with emphasis on security and identity management * Collaborate with development teams to refine infrastructure as code practices using Terraform * Monitor system performance and implement proactive reliability measures * Coordinate operational requests and maintenance activities effectively * Diagnose and resolve complex infrastructure and deployment challenges * Ensure adherence to security standards and company policies across all systems * Document infrastructure setups and standard operating procedures comprehensively * Lead disaster recovery and business continuity initiatives * Continuously assess emerging technologies to enhance system reliability and efficiency **Requirements** * Extensive experience of at least 7 years in Site Reliability Engineering or equivalent DevOps roles * Advanced proficiency in Python programming language * Comprehensive experience with Amazon Web Services and Microsoft Azure including API usage, authentication, and serverless solutions * Deep understanding of cloud networking, Kubernetes cluster management, security, IAM, and configuration automation * Strong knowledge of CI/CD workflows, source control systems, containerization, and infrastructure as code with Terraform * Proven expertise in enabling and improving IaaS environments * Demonstrated success in managing enterprise\-scale software development and deployments * Thorough understanding of automation techniques related to CI/CD and IaaS * Exceptional analytical and complex problem\-solving abilities * Effective management of operational requests and maintenance processes * Strong communication skills with English proficiency at B2\+ level

Vaga originalmente publicada em: indeed

💼 Encontre as melhores oportunidades para desenvolvedores no Job For Dev