HomeBack to CareersMachine Learning Performance Engineer
Trading, Research & ML

Machine Learning Performance Engineer

RemoteFull-timeSeniorPosted: April 22, 2026

About the Role

At Uncharted Network, our models retrain intraday and our inference must keep pace with the market. As a Machine Learning Performance Engineer, you will own the performance envelope of our entire ML stack — from batch training throughput to single-digit millisecond inference latency on the critical execution path. This role demands a whole-systems mindset: you will profile GPU warps, tune memory hierarchies, redesign storage access patterns, and optimise inter-node networking. If you close the gap between theoretical FLOP throughput and actual goodput, you will find this environment uniquely satisfying.

Responsibilities

  • Profile and optimise training runs end to end: GPU utilisation, memory bandwidth, collective communication, and storage I/O
  • Develop custom CUDA kernels and Triton programs for performance-critical model components
  • Reduce inference latency on the live trading path through kernel fusion, quantisation, and computation graph optimisation
  • Investigate and tune the full hardware stack: NVLink, InfiniBand, PCIe topology, NUMA layout, and host-GPU transfer patterns
  • Work with ML Researchers and ML Engineers to co-design models with hardware performance constraints in mind
  • Benchmark and document performance gains with rigorous, reproducible methodology

Requirements

  • Deep practical knowledge of GPU architecture: warps, cooperative groups, memory hierarchy, and Tensor Core utilisation
  • Hands-on experience with CUDA, PTX/SASS, and profiling tools (NSight Systems, NSight Compute, CUDA GDB)
  • Strong familiarity with ML frameworks at the C++/CUDA level (PyTorch internals, JAX XLA)
  • Understanding of distributed training networking: NCCL, InfiniBand/RoCE, GPUDirect, and collective communication algorithms
  • Solid general programming skills in Python and C++
  • Ability to interrogate performance from first principles and communicate findings rigorously

Nice to Have

  • Experience with Triton or CUTLASS for custom kernel authoring
  • Knowledge of inference optimisation techniques: INT8/FP8 quantisation, speculative decoding, or batched attention
  • Background in low-latency systems engineering: networking, storage, and OS-level scheduling
What We Offer
  • Competitive UNT token allocation + fiat salary
  • Fully remote with async-first culture
  • Dedicated GPU cluster access — profile and optimise on real production workloads at scale
  • Top-tier hardware setup stipend
  • Annual performance-engineering conference and technical learning budget
Uncharted Logo
Uncharted
How It WorksTokenomicsTransparencyRoadmapPartnersAboutBlogEarn UNT
Uncharted Logo
Uncharted

Where private capital meets algorithmic intelligence.

Platform

  • How It Works
  • Tokenomics
  • Transparency
  • Roadmap
  • Earn UNT
  • Blog
  • Partners

Support

  • FAQ
  • Contact
  • Privacy
  • Terms

Trust Center

  • About
  • Risk
  • Methodology

Company

  • Who We Are
  • What We Do
  • Culture
  • Careers
  • Contact Us

System

Version v1.3.1Open Beta
Investor Login
© 2026 Uncharted Network. All rights reserved.
PrivacyTerms