Overview

Tensorix is a sovereign AI infrastructure platform headquartered in Dublin. We deploy and operate open-source large language models on EU-sovereign infrastructure across Europe, providing private, zero-retention inference for regulated industries including finance, healthcare and government. Our platform offers drop-in OpenAI-compatible APIs, enabling developers and enterprises alike to adopt AI without compromising on data privacy, compliance or performance.


We are looking for a Senior Infrastructure Engineer (GPU Cloud)to join our growing engineering team. Reporting to the CTO, you will own the physical and virtualisation layer that underpins our GPU fleet, from bare-metal server deployment through to multi-tenant GPU-as-a-Service delivery. You will design, build and operate the compute, storage and network infrastructure that our ML serving layer runs on.


We currently operate Dell PowerEdge XE9780 servers with NVIDIA B300 SXM6 GPUs across our European estate, with an aggressive near-term growth plan spanning multiple sites across the EU. You will be the technical owner of our hardware strategy, cluster architecture and infrastructure automation as we build out a sovereign GPU cloud platform.


This is a deeply hands-on role. You will oversee server deployments, configure firmware, debug PCIe bus issues, design network fabrics, architect storage and build the orchestration layer that ties it all together. We are an AI-native team and tools such as Claude Code and Codex are part of our daily workflow, materially accelerating how we build and operate systems. We value engineers who combine deep systems knowledge with a pragmatic, builder mindset.


This is a high-impact senior individual contributor role spanning bare-metal infrastructure, GPU cluster architecture and multi-site estate planning.

Responsibilities

  • Bare-Metal & Firmware Management - Lead the deployment and maintenance of GPU server fleets including BIOS configuration, iDRAC/BMC management, firmware updates (GPU baseboard, FPGA, CPLD, PCIe switches), kernel parameter tuning and driver stack management for NVIDIA B-series GPUs.

  • Hypervisor & Virtualisation - Design and operate our virtualisation layer using Proxmox VE or OpenStack, including GPU passthrough (VFIO-PCI), NVSwitch fabric management through host-level services and multi-tenant GPU allocation.

  • Network Architecture - Design and maintain the network fabric across multiple racks and sites, spanning management VLANs, storage networks, tenant data planes and GPU interconnect. Work with bonded NICs, jumbo frames, InfiniBand and ConnectX adapters. Plan the network topology for multi-site EU deployments.

  • Storage Architecture - Design, deploy and operate shared storage infrastructure across multiple racks and sites, including SAN (Dell ME-series, iSCSI multipath), NFS and local NVMe. Optimise for large model weights (hundreds of GB per model), high-throughput sequential reads and cross-site replication. Own SAN performance tuning, capacity planning and data placement strategy as the estate grows.

  • GPU-as-a-Service Platform - Build the infrastructure layer for multi-tenant GPU delivery, including tenant isolation, resource scheduling, capacity planning and usage metering. Design the platform so customers can consume GPU resources via API without touching the underlying hardware.

  • Cluster Orchestration & Automation - Automate server provisioning, OS deployment, driver installation and cluster configuration. Build infrastructure-as-code for repeatable, auditable deployments across multiple sites.

  • Monitoring & Reliability - Instrument the infrastructure stack with monitoring covering GPU health, NVSwitch fabric status, storage throughput, network utilisation and hardware telemetry (DCGM, iDRAC, IPMI). Own incident response for hardware and infrastructure faults.

  • Hardware Strategy & Estate Planning - Work with the CTO to plan GPU procurement cycles, evaluate server platforms, specify network and storage hardware and manage vendor relationships. Design the infrastructure blueprint for new EU datacentre deployments, defining standard rack layouts, power and cooling requirements, network topology and storage architecture that can be replicated across sites with minimal variance.

  • Security & Compliance - Ensure infrastructure meets the requirements of regulated industries including data residency, tenant isolation, encryption at rest and in transit and audit logging. Support EU sovereignty requirements across our deployment sites.

Skills & Experience

  • 5+ years of professional experience in infrastructure engineering, systems administration or datacentre operations, with a meaningful portion involving GPU or HPC infrastructure

  • Hands-on experience with bare-metal Linux server deployment and management, including kernel tuning, driver management, PCI device configuration and UEFI/BIOS configuration

  • Strong working knowledge of NVIDIA GPU server platforms, including driver installation, NVLink/NVSwitch fabric, Fabric Manager, DCGM and GPU passthrough via VFIO

  • Experience with virtualisation platforms, ideally Proxmox VE or OpenStack, including PCI passthrough for GPU workloads

  • Solid understanding of network design including VLANs, bonding/LACP, jumbo frames, InfiniBand and routing in multi-rack environments

  • Experience with enterprise storage including SAN (iSCSI, FC), NFS, multipath I/O and performance tuning for large sequential workloads

  • Proficiency with Linux (Ubuntu Server and/or Debian), systemd, networking stack (ip, nmcli, netplan) and shell scripting

  • Experience with infrastructure-as-code and automation tooling (Ansible, Terraform or similar)

  • Comfortable using AI-assisted development tools (e.g. Claude Code, Codex) as part of your daily workflow

  • Methodical approach to troubleshooting with the ability to work across firmware, kernel, driver and userspace layers to diagnose complex hardware issues

Nice to Have

  • Experience building or operating GPU cloud / GPU-as-a-Service platforms

  • Familiarity with Dell PowerEdge server management (iDRAC, Redfish API, racadm, Dell SupportAssist)

  • Experience with NVIDIA ConnectX network adapters and OFED/MOFED stack

  • Exposure to Kubernetes with GPU scheduling (NVIDIA GPU Operator, device plugins, MIG)

  • Experience with MAAS, Ironic or other bare-metal provisioning systems

  • Knowledge of InfiniBand fabric management for multi-node GPU training clusters

  • Familiarity with European data sovereignty and compliance frameworks (GDPR, DORA, NIS2)

  • Contributions to open-source infrastructure projects

Education & Qualifications

  • BSc/MSc in Computer Science, Software Engineering, Electrical Engineering, Network Engineering or a related technical discipline OR equivalent practical experience

Remuneration

  • Highly competitive package, dependent on experience

  • 25 days paid annual leave

  • Hybrid working from our centrally located Dublin office, with remote flexibility

  • Occasional travel to our EU datacentre sites as the estate grows

  • Free inference tokens!

******* NO AGENCY ASSISTANCE REQUIRED *******


Apply for position now

Are you currently eligible to work in Ireland or the EU without sponsorship?
Which NVIDIA GPU generations have you worked with in a production or lab environment? (Select all that apply)
What is your notice period or earliest available start date?
Which best describes your experience with multi-tenant or as-a-Service infrastructure?