Requirements
• BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, or related field,
• 4+ years of hands-on ML compiler or systems engineering experience,
• Demonstrated experience building and owning an end-to-end compiler stack (front-end, IR, optimization, and backend code generation),
• Experience working with machine learning models, neural network graphs, and graph optimizations as part of lowering and acceleration, using frameworks like TVM, XLA, or Glow,
• Comfortable collaborating with hardware teams to map novel architectural primitives from IR to efficient lowerings, kernel implementations, and runtime support,
• Strong understanding of compiler performance trade-offs, profiling, bottleneck analysis, and optimization strategies for ML workloads,
• (Desirable) Prior experience on compilers for AI/ML accelerators, GPUs, DSPs, or domain-specific architectures,
• (Desirable) Contributions to LLVM, MLIR, XLA, TVM, or related open-source compiler projects,
• (Desirable) Experience in kernel performance optimization and accelerator-specific code generation,
• (Desirable) Demonstrated work in hardware-software co-design where compiler insights shaped ISA or architectural decisions,
• (Desirable) Experience building or contributing to cycle-accurate simulators for performance modeling,
• (Desirable) Prior work building profiling tools, performance evaluation suites, or bottleneck analyzers for compiler or runtime stacks,
• (Desirable) Familiarity with deep learning frameworks and model formats (e.g., JAX, ONNX**, PyTorch, TensorFlow**) and graph transformations,
• (Desirable) Experience designing custom IR dialects, optimization passes, and domain-specific lowering transformations
What the job involves
• We're building an AI accelerator from the ground up, and we need a strong ML compiler engineer to be at the heart of hardware-software co-design. This isn't about inheriting a mature compiler stack - it's about creating one,
• You'll join at the architecture definition stage, directly influencing ISA design and the trade-offs that determine what our hardware can do. As we progress toward hardware bringup, you'll build the complete compiler toolchain that takes machine learning models from high-level frameworks down to efficient execution on our novel architecture,
• This role offers the rare opportunity to shape both silicon and software simultaneously,
• The rare opportunity to shape both silicon and software simultaneously. You'll work alongside hardware architects and researchers to co-design compiler strategies that unlock the full potential of our accelerator, building infrastructure that bridges the gap between ML model graphs and custom ISA primitives,
• Your compiler decisions will directly inform hardware features, and hardware capabilities will open new optimisation frontiers for your toolchain,
• If you want to architect a compiler stack from first principles, optimise ML workloads on new hardware, and see your decisions realised in silicon, this is the role,
• Work across the full stack with software, systems, and hardware teams to ensure correctness, performance, and deployment readiness for real workloads,
• Contribute to shaping the long-term compiler architecture and tooling strategy in a fast-moving startup environment,
• Design and implement parts of the compiler stack targeting our novel AI accelerator, including front-end lowering, IR transformations, optimization passes, and backend code generation,
• Build and evolve MLIR/LLVM based infrastructure to support graph lowering, hardware-aware optimizations, and performance-centric code emission,
• Collaborate closely with hardware architects, microarchitects, and research teams to co-design compiler strategies that align with evolving ISA and hardware constraints,
• Develop profiling and analysis tools to identify performance bottlenecks, validate generated code, and ensure high throughput/low latency execution of AI workloads,
• Enable efficient mapping of high-level ML models to hardware by working with model frameworks and graph representations (e.g., ONNX, JAX, PyTorch),
• Drive performance tuning strategies including kernel authoring, schedule generation, and hardware-specific optimization passes
Apply Now
Apply Now