Lead Site Reliability Engineer

EPAM Systems
Lead
Remoto 🌐
Publicado em 12 de novembro de 2025

Descrição da Vaga

Become a key member of our Enterprise Technology group as a **Lead Site Reliability Engineer** focused on maintaining and advancing critical infrastructure and applications. You will leverage your expertise in DevOps, cloud environments, and automation to design resilient and scalable systems. If you thrive on enhancing infrastructure and promoting continuous delivery practices, we invite you to join our team. **Responsibilities** * Maintain and enhance enterprise application infrastructure using DevOps principles * Design and oversee CI/CD pipelines to enable fast and dependable software deployments * Administer and tune Kubernetes clusters for optimal scalability and security * Create automation tools and scripts primarily in Python * Manage cloud infrastructure across Amazon Web Services and Microsoft Azure focusing on security and identity management * Partner with development teams to improve infrastructure as code via Terraform * Monitor system metrics to proactively maintain high availability * Coordinate operational requests and maintenance activities efficiently * Diagnose and resolve complex infrastructure and deployment challenges * Ensure adherence to security policies and industry best practices * Document infrastructure setups and operational standards * Contribute to disaster recovery and business continuity strategies * Evaluate and integrate new technologies to boost system reliability and efficiency **Requirements** * 5 more years of experience in Site Reliability Engineering or equivalent DevOps roles * Advanced proficiency in Python programming * Extensive expertise with AWS and Azure including APIs, authentication, and serverless services * Comprehensive knowledge of cloud networking, Kubernetes administration, security, IAM, and configuration automation * Deep understanding of CI/CD workflows, version control, containerization, and Terraform\-based infrastructure management * Hands\-on experience enabling and enhancing IaaS environments * Proven success in enterprise\-scale software development and release processes * Solid grasp of automation concepts related to CI/CD and infrastructure management * Strong analytical and complex problem\-solving abilities * Capability to handle operational requests and maintenance tasks effectively * Excellent communication skills with English proficiency at B2\+ level

Vaga originalmente publicada em: indeed

💼 Encontre as melhores oportunidades para desenvolvedores no Job For Dev