Middle SRE / Observability Engineer

EPAM Systems
Sênior
Remoto 🌐
Publicado em 25 de fevereiro de 2026

Descrição da Vaga

We are strengthening our platform team with a **Middle Site Observability Engineer** to keep Kubernetes production services stable for AI research on Azure Stack. You will enhance observability, handle business\-hours operational support, and work closely with engineering and research partners to improve reliability and processes—apply now. **Responsibilities** * Develop, operate, and enhance observability capabilities, including dashboards and visualizations in Grafana or similar tools * Establish and maintain metrics, SLIs, SLOs, and alerting approaches for production platforms * Deliver business\-hours operational support for Kubernetes\-based environments through troubleshooting, log analysis, and metrics\-driven investigations * Assist with production operations for SQL\-based systems by diagnosing issues and supporting performance investigations * Investigate incidents and system behavior to identify root causes, participate in post\-incident reviews, and propose improvements to monitoring and reliability practices * Partner with engineering, platform, and research teams to raise observability standards, refine operational processes, and increase system reliability * Create and maintain documentation, share knowledge across the team, and drive ongoing improvement activities **Requirements** * Hands\-on experience of 2\+ years in Site Reliability Engineering, DevOps or Production Support for live production systems * Practical knowledge of observability and monitoring stacks such as Grafana, Prometheus, Elastic Stack, or Datadog * Solid understanding of Linux systems with strong troubleshooting abilities and log analysis skills * Background supporting Kubernetes\-based production environments * Working experience with SQL production support, including query troubleshooting and basic performance analysis * Proficiency in automation scripting using Python, Bash, or similar languages * Ability to assess incidents, determine root causes, and contribute to continuous improvement efforts * Effective communication skills and comfort collaborating with distributed, cross\-functional teams * English proficiency at an intermediate to advanced level (B1–C1\)

Vaga originalmente publicada em: indeed

Receba vagas como esta no seu email

Crie um alerta gratuito e seja o primeiro a saber de novas oportunidades

Criar Alerta Gratuito

Alertas que entendem o que você quer

Não receba qualquer vaga. Receba apenas as que combinam exatamente com o que você busca.

Alerta genérico

Filtro:

Python

Você recebe tudo isso:

Vaga de Python + Django
Vaga de Python + Flask
Vaga de Python + ETL/Data
Vaga de Python + Machine Learning
...e muito ruído no seu email
Alerta inteligente

Filtro:

Python+FastAPI

Você recebe apenas:

Desenvolvedor Python + FastAPI
Backend Engineer (FastAPI)
API Developer - Python/FastAPI

Zero ruído. Só vagas relevantes para você.

Outros exemplos de filtros precisos:

JavaScript+React+Remoto
Java+Spring Boot+Sênior
Go+Kubernetes

Filtros Combinados

Combine linguagem + framework + nível + localização. Seja tão específico quanto quiser.

Email Diário

Receba um resumo diário apenas com vagas que passam nos seus filtros. Sem spam.

Kanban Visual

Organize suas candidaturas em um quadro Kanban. Acompanhe cada processo seletivo.

Planos simples, sem surpresas

Comece grátis e faça upgrade quando quiser

Gratuito

R$ 0para sempre
  • Busca de vagas ilimitada
  • Salvar até 10 vagas
  • 1 quadro Kanban
Criar Conta Grátis
Popular

Premium

R$ 9,90/mês
  • Tudo do plano gratuito
  • Vagas salvas ilimitadas
  • Quadros Kanban ilimitados
  • Alertas de vagas por email
  • Suporte prioritário
3 dias grátis, sem cartão

Pronto para encontrar sua vaga ideal?

Junte-se a milhares de desenvolvedores que já usam o Job For Dev

Encontre as melhores oportunidades para desenvolvedores no Job For Dev

Middle SRE / Observability Engineer - EPAM Systems | Job For Dev