Senior AWS Platform Engineer (HPC Enablement)
Descrição da Vaga
We are looking for a **Senior Cloud Engineer** to own and operate an AWS platform that enables an HPC team to run workloads reliably at scale. You will build standardized infrastructure, automation, observability, and scaling across multi\-account AWS and Kubernetes—apply to help deliver robust cloud foundations. **Responsibilities** * Own the AWS environment and platform operations that support HPC workloads at scale * Provision and manage AWS accounts via internal self\-service tooling and standardized patterns * Build and maintain Terraform code to provision AWS resources and HPC\-oriented clusters * Design and operate centralized CI/CD pipelines to manage all accounts and clusters from a single repository * Migrate remaining AWS accounts into the central repository and standardize infrastructure patterns * Operate and support an in\-cluster container registry (Harbor) and related platform components * Implement and complete observability rollout across the AWS environment, including metrics, logs, dashboards, and alerting * Support Kubernetes cluster operations and troubleshoot platform issues impacting HPC workloads * Own and improve Cast AI as the primary mechanism for cluster scaling and optimization * Design and support cross\-cloud data transfer and networking solutions such as AWS DataSync and Interconnect between AWS and GCP * Collaborate with the HPC team to translate requirements into implemented platform solutions * Coordinate working hours to maintain at least 4 hours overlap with Houston time zone and occasional overlap with Australia **Requirements** * 3\+ years of hands\-on experience with Amazon Web Services in multi\-account environments * Infrastructure\-as\-code experience with Terraform (HCL/tofu), including modules and state * Kubernetes operations experience, including troubleshooting clusters and workloads * Proven ability to lead technical ownership as a staff\-level individual contributor and drive standards across teams * Strong project execution skills to take requirements, evaluate options, and deliver solutions with minimal guidance * Advanced programming skills in Python for automation, tooling, and integrations * Strong scripting skills in Bash for operational automation * Solid CI/CD and GitOps workflow knowledge using tools such as GitLab CI or GitHub Actions * Strong observability skills across metrics, logs, dashboards, and alerting using Prometheus and Grafana * Experience with cluster scaling and cost optimization using Cast AI or similar tooling * Ability to use AI\-assisted tools for code generation, debugging, and documentation in daily work * Upper\-Intermediate English proficiency (CEFR B2\) **Nice to have** * Google Cloud Platform experience, especially in cross\-cloud integrations with AWS * High\-performance computing (HPC) experience with schedulers or data\-intensive pipelines
Vaga originalmente publicada em: indeed
Receba vagas como esta no seu email
Crie um alerta gratuito e seja o primeiro a saber de novas oportunidades
Alertas que entendem o que você quer
Não receba qualquer vaga. Receba apenas as que combinam exatamente com o que você busca.
Filtro:
Você recebe tudo isso:
Filtro:
Você recebe apenas:
Zero ruído. Só vagas relevantes para você.
Outros exemplos de filtros precisos:
Filtros Combinados
Combine linguagem + framework + nível + localização. Seja tão específico quanto quiser.
Email Diário
Receba um resumo diário apenas com vagas que passam nos seus filtros. Sem spam.
Kanban Visual
Organize suas candidaturas em um quadro Kanban. Acompanhe cada processo seletivo.
Planos simples, sem surpresas
Comece grátis e faça upgrade quando quiser
Premium
- Tudo do plano gratuito
- Vagas salvas ilimitadas
- Quadros Kanban ilimitados
- Alertas de vagas por email
- Suporte prioritário
Pronto para encontrar sua vaga ideal?
Junte-se a milhares de desenvolvedores que já usam o Job For Dev
Encontre as melhores oportunidades para desenvolvedores no Job For Dev