Lead Site Reliability Engineer

EPAM Systems
Lead
Remoto 🌐
Publicado em 20 de novembro de 2025

Descrição da Vaga

We are seeking a **Lead Site Reliability Engineer** to oversee enterprise application infrastructure and drive reliability through advanced DevOps practices and tools. You will lead efforts to optimize cloud environments, implement robust CI/CD pipelines, and ensure secure, scalable Kubernetes cluster operations. Join us to influence infrastructure excellence and streamline enterprise software delivery. **Responsibilities** * Oversee and improve enterprise application infrastructure using DevOps principles * Design and maintain CI/CD pipelines to support continuous delivery * Automate infrastructure setup and management with Terraform * Manage and secure Kubernetes clusters to guarantee high availability * Analyze system metrics and implement reliability enhancements * Partner with development teams to refine deployment workflows and automation * Respond to operational incidents and maintain system uptime * Enforce security standards in infrastructure and application deployments * Diagnose complex issues in cloud infrastructure and application performance * Collaborate with cross\-functional teams to facilitate enterprise software releases * Create and maintain documentation for system setups, procedures, and troubleshooting * Assess and adopt new technologies to enhance infrastructure management **Requirements** * Expert Python programming skills with over 5 years of experience * Proven expertise in Amazon Web Services and Microsoft Azure, including API, authentication, and serverless technologies * Comprehensive knowledge of cloud networking, Kubernetes administration, security, IAM, and automation * In\-depth understanding of CI/CD, version control, containerization, and Terraform\-based infrastructure management * Experience with IaaS enablement and continuous improvement * Background in enterprise\-level software development and release coordination * Strong grasp of automation concepts related to CI/CD and IaaS * Exceptional problem\-solving and analytical skills * Capability to effectively manage operational requests and maintenance activities * Excellent English communication skills, both written and verbal (B2 level or higher)

Vaga originalmente publicada em: indeed

💼 Encontre as melhores oportunidades para desenvolvedores no Job For Dev