Senior SRE / Observability Engineer

EPAM Systems
Sênior
Remoto 🌐
Publicado em 25 de fevereiro de 2026

Descrição da Vaga

We are looking for a **Senior SRE / Observability Engineer** to ensure the reliability and performance of production Kubernetes\-based systems supporting AI research within an Azure Stack environment. This position focuses on observability, operational support and collaboration with engineering and research teams to drive operational excellence. **Responsibilities** * Build, maintain and improve observability solutions, including dashboards and visualizations using Grafana or similar tools * Define, implement and manage metrics, SLIs, SLOs and alerting strategies for production systems * Provide business\-hours operational support for Kubernetes\-based environments, including troubleshooting, log analysis and metric\-driven investigations * Support and troubleshoot SQL\-based systems as part of production operations, assisting with issue analysis and performance investigations * Analyze incidents and system behaviors to identify root causes, contribute to post\-incident reviews and recommend improvements to monitoring and reliability practices * Collaborate with engineering, platform and research teams to improve observability standards, operational processes and system reliability * Contribute to documentation, knowledge sharing and continuous improvement within the team **Requirements** * 3\+ years of experience in Site Reliability Engineering, DevOps or Production Support roles supporting production systems * Knowledge of observability and monitoring stacks such as Grafana, Prometheus, Elastic Stack or Datadog * Understanding of Linux systems with strong troubleshooting and log analysis skills * Background in supporting Kubernetes\-based environments in production * Skills in SQL production support, including query troubleshooting and basic performance analysis * Proficiency in scripting with Python, Bash or similar languages for automation and operational tasks * Capability to analyze incidents, identify root causes and contribute to continuous improvement initiatives * Competency in communication and collaboration with distributed and cross\-functional teams * English proficiency at an intermediate to advanced level

Vaga originalmente publicada em: indeed

Receba vagas como esta no seu email

Crie um alerta gratuito e seja o primeiro a saber de novas oportunidades

Criar Alerta Gratuito

Alertas que entendem o que você quer

Não receba qualquer vaga. Receba apenas as que combinam exatamente com o que você busca.

Alerta genérico

Filtro:

Python

Você recebe tudo isso:

Vaga de Python + Django
Vaga de Python + Flask
Vaga de Python + ETL/Data
Vaga de Python + Machine Learning
...e muito ruído no seu email
Alerta inteligente

Filtro:

Python+FastAPI

Você recebe apenas:

Desenvolvedor Python + FastAPI
Backend Engineer (FastAPI)
API Developer - Python/FastAPI

Zero ruído. Só vagas relevantes para você.

Outros exemplos de filtros precisos:

JavaScript+React+Remoto
Java+Spring Boot+Sênior
Go+Kubernetes

Filtros Combinados

Combine linguagem + framework + nível + localização. Seja tão específico quanto quiser.

Email Diário

Receba um resumo diário apenas com vagas que passam nos seus filtros. Sem spam.

Kanban Visual

Organize suas candidaturas em um quadro Kanban. Acompanhe cada processo seletivo.

Planos simples, sem surpresas

Comece grátis e faça upgrade quando quiser

Gratuito

R$ 0para sempre
  • Busca de vagas ilimitada
  • Salvar até 10 vagas
  • 1 quadro Kanban
Criar Conta Grátis
Popular

Premium

R$ 9,90/mês
  • Tudo do plano gratuito
  • Vagas salvas ilimitadas
  • Quadros Kanban ilimitados
  • Alertas de vagas por email
  • Suporte prioritário
3 dias grátis, sem cartão

Pronto para encontrar sua vaga ideal?

Junte-se a milhares de desenvolvedores que já usam o Job For Dev

Encontre as melhores oportunidades para desenvolvedores no Job For Dev

Senior SRE / Observability Engineer - EPAM Systems | Job For Dev