HomeTorna alle carriereMachine Learning Performance Engineer
Trading, Research & ML

Machine Learning Performance Engineer

RemoteFull-timeSeniorPubblicato: April 22, 2026

Informazioni sulla posizione

At Uncharted Network, our models retrain intraday and our inference must keep pace with the market. As a Machine Learning Performance Engineer, you will own the performance envelope of our entire ML stack — from batch training throughput to single-digit millisecond inference latency on the critical execution path. This role demands a whole-systems mindset: you will profile GPU warps, tune memory hierarchies, redesign storage access patterns, and optimise inter-node networking. If you close the gap between theoretical FLOP throughput and actual goodput, you will find this environment uniquely satisfying.

Responsabilità

  • Profile and optimise training runs end to end: GPU utilisation, memory bandwidth, collective communication, and storage I/O
  • Develop custom CUDA kernels and Triton programs for performance-critical model components
  • Reduce inference latency on the live trading path through kernel fusion, quantisation, and computation graph optimisation
  • Investigate and tune the full hardware stack: NVLink, InfiniBand, PCIe topology, NUMA layout, and host-GPU transfer patterns
  • Work with ML Researchers and ML Engineers to co-design models with hardware performance constraints in mind
  • Benchmark and document performance gains with rigorous, reproducible methodology

Requisiti

  • Deep practical knowledge of GPU architecture: warps, cooperative groups, memory hierarchy, and Tensor Core utilisation
  • Hands-on experience with CUDA, PTX/SASS, and profiling tools (NSight Systems, NSight Compute, CUDA GDB)
  • Strong familiarity with ML frameworks at the C++/CUDA level (PyTorch internals, JAX XLA)
  • Understanding of distributed training networking: NCCL, InfiniBand/RoCE, GPUDirect, and collective communication algorithms
  • Solid general programming skills in Python and C++
  • Ability to interrogate performance from first principles and communicate findings rigorously

Requisiti preferenziali

  • Experience with Triton or CUTLASS for custom kernel authoring
  • Knowledge of inference optimisation techniques: INT8/FP8 quantisation, speculative decoding, or batched attention
  • Background in low-latency systems engineering: networking, storage, and OS-level scheduling
Cosa offriamo
  • Competitive UNT token allocation + fiat salary
  • Fully remote with async-first culture
  • Dedicated GPU cluster access — profile and optimise on real production workloads at scale
  • Top-tier hardware setup stipend
  • Annual performance-engineering conference and technical learning budget
Logo Uncharted
Uncharted
Come funzionaTokenomicsTransparencyRoadmapPartnerChi siamoBlogGuadagna UNT
Logo Uncharted
Uncharted

Dove il capitale privato incontra l'intelligenza algoritmica.

Piattaforma

  • Come funziona
  • Tokenomics
  • Transparency
  • Roadmap
  • Guadagna UNT
  • Blog
  • Partner

Supporto

  • FAQ
  • Contatti
  • Privacy
  • Termini

Trust Center

  • Chi siamo
  • Rischi
  • Metodologia

Azienda

  • Chi siamo
  • Cosa facciamo
  • Cultura
  • Lavora con noi
  • Contattaci

Sistema

Versione v1.3.1Beta aperta
Login Investitori
© 2026 Uncharted Network. Tutti i diritti riservati.
PrivacyTermini