Senior DevOps Engineer
Descrição da Vaga
We are operating Kubernetes and Linux GPU infrastructure for AI and research teams, emphasizing automation, scheduling accuracy, and reliability at scale. In this Senior DevOps Engineer position, you will own Kubernetes administration, implement Volcano queues and policies, and automate day\-to\-day operations with Python and UNIX shell scripting. Apply now to help optimize shared compute environments **Responsibilities** * Implement and maintain GPU\-enabled Kubernetes clusters and standalone Linux compute environments to support reliable workload scheduling and performance * Configure and run Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement * Manage Kubernetes environments end\-to\-end, including namespaces, RBAC, resource quotas, and workload isolation strategies * Automate job submission, resource provisioning, and system reporting by developing Python and Shell scripts * Collaborate with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows * Observe infrastructure health and resource utilization, supplying data to meet optimization and reporting requirements * Improve infrastructure, tooling, and automation workflows to boost performance, scalability, and usability * Support operational processes that ensure a seamless experience for researchers running diverse AI and computational workloads **Requirements** * Minimum 3 years of experience in DevOps or infrastructure engineering within complex, large\-scale environments * Expert proficiency in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management * Hands\-on Volcano experience for GPU job execution, queue configuration, workload prioritization, and integration with Kubernetes * Proven experience managing GPU cluster environments in Kubernetes and on standalone Linux compute nodes * Advanced Python scripting skills for infrastructure automation along with proficiency in UNIX Shell scripting (e.g., Bash) * Strong Linux system administration skills, including troubleshooting, performance tuning, and configuration management * Solid understanding of infrastructure automation and orchestration concepts and tooling * Fluent English communication skills (spoken and written) for direct client interaction **Nice to have** * Helm for Kubernetes package management * Prometheus, Grafana, and Loki for monitoring and observability * Terraform for Infrastructure as Code * Multi\-cloud Kubernetes background with Amazon EKS and Google GKE * Azure Networking knowledge including VPN, ExpressRoute, and network security * Experience with AI\-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude) * Hybrid (cloud \+ on\-premises) scheduling and resource optimization exposure
Vaga originalmente publicada em: indeed
Receba vagas como esta no seu email
Crie um alerta gratuito e seja o primeiro a saber de novas oportunidades
Alertas que entendem o que você quer
Não receba qualquer vaga. Receba apenas as que combinam exatamente com o que você busca.
Filtro:
Você recebe tudo isso:
Filtro:
Você recebe apenas:
Zero ruído. Só vagas relevantes para você.
Outros exemplos de filtros precisos:
Filtros Combinados
Combine linguagem + framework + nível + localização. Seja tão específico quanto quiser.
Email Diário
Receba um resumo diário apenas com vagas que passam nos seus filtros. Sem spam.
Kanban Visual
Organize suas candidaturas em um quadro Kanban. Acompanhe cada processo seletivo.
Planos simples, sem surpresas
Comece grátis e faça upgrade quando quiser
Premium
- Tudo do plano gratuito
- Vagas salvas ilimitadas
- Quadros Kanban ilimitados
- Alertas de vagas por email
- Suporte prioritário
Pronto para encontrar sua vaga ideal?
Junte-se a milhares de desenvolvedores que já usam o Job For Dev
Encontre as melhores oportunidades para desenvolvedores no Job For Dev