Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.
About the role
We’re building and operating high-performance, large-scale AI workload data centers — and Kubernetes is at the heart of everything we do. As a Senior Kubernetes Platform Engineer, you’ll work closely with a team of experienced engineers to design, implement, and optimize secure, bare-metal Kubernetes infrastructure for both internal workloads and managed customer environments.
You will partner on architectural initiatives, drive innovation across ingress/egress solutions, and help harden our multi-tenant Kubernetes offerings. This role is ideal for deeply technical engineers who are passionate about performance, scale, and reliability in a fast-paced AI-native environment.
Responsibilities
Design and deploy bare-metal Kubernetes clusters at scale using RKE2
Collaborate with senior engineers on architectural improvements, infrastructure planning, and automation
Lead the design and implementation of Ingress and Egress traffic solutions, leveraging HAProxy, Cilium, and other components
Contribute to multi-tenant environment designs including VPC-level isolation, network policy enforcement, and secure shared services
Drive continuous improvement around observability using Prometheus and related tooling
Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
Collaborate cross-functionally with AI platform teams and internal/external customers
Required Experience
7+ years of experience in infrastructure engineering roles at a CSP or hyperscaler environments
5+ years hands-on experience managing Kubernetes in bare-metal environments
Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
Experience with ingress controllers, load balancing, and service mesh (e.g., HAProxy, Cilium, Envoy)
Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
Experience monitoring Kubernetes workloads with Prometheus and related observability tools
Preferred Experience
Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
Exposure to AI/ML infrastructure workloads or GPU resource scheduling
Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)
What We Bring
Mission driven company
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Flexible PTO
Paid Holidays
401(k)
Parental Leave
Flexible Spending Account
Short Term Disability Insurance
Life and Voluntary Supplemental Insurance
Mental Health Benefits through Spring Health
We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.
Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.