AccueilRetour aux carrièresMachine Learning Performance Engineer
Trading, Research & ML

Machine Learning Performance Engineer

RemoteFull-timeSeniorPublié le: April 22, 2026

À propos du poste

At Uncharted Network, our models retrain intraday and our inference must keep pace with the market. As a Machine Learning Performance Engineer, you will own the performance envelope of our entire ML stack — from batch training throughput to single-digit millisecond inference latency on the critical execution path. This role demands a whole-systems mindset: you will profile GPU warps, tune memory hierarchies, redesign storage access patterns, and optimise inter-node networking. If you close the gap between theoretical FLOP throughput and actual goodput, you will find this environment uniquely satisfying.

Responsabilités

  • Profile and optimise training runs end to end: GPU utilisation, memory bandwidth, collective communication, and storage I/O
  • Develop custom CUDA kernels and Triton programs for performance-critical model components
  • Reduce inference latency on the live trading path through kernel fusion, quantisation, and computation graph optimisation
  • Investigate and tune the full hardware stack: NVLink, InfiniBand, PCIe topology, NUMA layout, and host-GPU transfer patterns
  • Work with ML Researchers and ML Engineers to co-design models with hardware performance constraints in mind
  • Benchmark and document performance gains with rigorous, reproducible methodology

Exigences

  • Deep practical knowledge of GPU architecture: warps, cooperative groups, memory hierarchy, and Tensor Core utilisation
  • Hands-on experience with CUDA, PTX/SASS, and profiling tools (NSight Systems, NSight Compute, CUDA GDB)
  • Strong familiarity with ML frameworks at the C++/CUDA level (PyTorch internals, JAX XLA)
  • Understanding of distributed training networking: NCCL, InfiniBand/RoCE, GPUDirect, and collective communication algorithms
  • Solid general programming skills in Python and C++
  • Ability to interrogate performance from first principles and communicate findings rigorously

Un plus

  • Experience with Triton or CUTLASS for custom kernel authoring
  • Knowledge of inference optimisation techniques: INT8/FP8 quantisation, speculative decoding, or batched attention
  • Background in low-latency systems engineering: networking, storage, and OS-level scheduling
Ce que nous offrons
  • Competitive UNT token allocation + fiat salary
  • Fully remote with async-first culture
  • Dedicated GPU cluster access — profile and optimise on real production workloads at scale
  • Top-tier hardware setup stipend
  • Annual performance-engineering conference and technical learning budget
Logo Uncharted
Uncharted
Comment ça marcheTokenomicsTransparencyRoadmapPartenairesÀ proposBlogGagner des UNT
Logo Uncharted
Uncharted

Là où le capital privé rencontre l'intelligence algorithmique.

Plateforme

  • Comment ça marche
  • Tokenomics
  • Transparency
  • Roadmap
  • Gagne des UNT
  • Blog
  • Partenaires

Support

  • FAQ
  • Contact
  • Privacy
  • Conditions

Centre de confiance

  • À propos
  • Risques
  • Méthodologie

Entreprise

  • Qui sommes-nous
  • Ce que nous faisons
  • Culture
  • Carrières
  • Nous contacter

Système

Version v1.3.1Bêta ouverte
Connexion investisseur
© 2026 Uncharted Network. Tous droits réservés.
PrivacyConditions