Senior SRE / Observability Engineer
Descrição da Vaga
We are looking for a **Senior SRE / Observability Engineer** to ensure the reliability and performance of production Kubernetes\-based systems supporting AI research within an Azure Stack environment. This position focuses on observability, operational support and collaboration with engineering and research teams to drive operational excellence. **Responsibilities** * Build, maintain and improve observability solutions, including dashboards and visualizations using Grafana or similar tools * Define, implement and manage metrics, SLIs, SLOs and alerting strategies for production systems * Provide business\-hours operational support for Kubernetes\-based environments, including troubleshooting, log analysis and metric\-driven investigations * Support and troubleshoot SQL\-based systems as part of production operations, assisting with issue analysis and performance investigations * Analyze incidents and system behaviors to identify root causes, contribute to post\-incident reviews and recommend improvements to monitoring and reliability practices * Collaborate with engineering, platform and research teams to improve observability standards, operational processes and system reliability * Contribute to documentation, knowledge sharing and continuous improvement within the team **Requirements** * 3\+ years of experience in Site Reliability Engineering, DevOps or Production Support roles supporting production systems * Knowledge of observability and monitoring stacks such as Grafana, Prometheus, Elastic Stack or Datadog * Understanding of Linux systems with strong troubleshooting and log analysis skills * Background in supporting Kubernetes\-based environments in production * Skills in SQL production support, including query troubleshooting and basic performance analysis * Proficiency in scripting with Python, Bash or similar languages for automation and operational tasks * Capability to analyze incidents, identify root causes and contribute to continuous improvement initiatives * Competency in communication and collaboration with distributed and cross\-functional teams * English proficiency at an intermediate to advanced level
Vaga originalmente publicada em: indeed
Receba vagas como esta no seu email
Crie um alerta gratuito e seja o primeiro a saber de novas oportunidades
Alertas que entendem o que você quer
Não receba qualquer vaga. Receba apenas as que combinam exatamente com o que você busca.
Filtro:
Você recebe tudo isso:
Filtro:
Você recebe apenas:
Zero ruído. Só vagas relevantes para você.
Outros exemplos de filtros precisos:
Filtros Combinados
Combine linguagem + framework + nível + localização. Seja tão específico quanto quiser.
Email Diário
Receba um resumo diário apenas com vagas que passam nos seus filtros. Sem spam.
Kanban Visual
Organize suas candidaturas em um quadro Kanban. Acompanhe cada processo seletivo.
Planos simples, sem surpresas
Comece grátis e faça upgrade quando quiser
Premium
- Tudo do plano gratuito
- Vagas salvas ilimitadas
- Quadros Kanban ilimitados
- Alertas de vagas por email
- Suporte prioritário
Pronto para encontrar sua vaga ideal?
Junte-se a milhares de desenvolvedores que já usam o Job For Dev
Encontre as melhores oportunidades para desenvolvedores no Job For Dev