Senior/Staff Software Engineer - Machine Learning & System Optimization
Zoox
Boston, MA, US
Onsite
2026-07-01
Announced salary
$226,000 - $307,000
Low
$105K
Median
$147K
High
$189K
Market in Boston · BLS OEWS 2025
Estimated net pay
$13,037 - $17,044
/month · 31% withheld
after tax & contributions · Single, no dependents
Job description
The Perception team is pioneering the development of a multi\-modality foundation model to drive the next generation of autonomous system intelligence.
As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on\-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another.
You will focus on bringing highly efficient, production\-ready large\-scale models to our on\-vehicle stack. We are looking for experts with hands\-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power\- and thermal\-constrained vehicle SoCs.
In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real\-time, deterministic execution on edge devices.
### **In this role, you will:**
* Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.
* Spearhead cross\-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.
* Optimize large\-scale models (Multi\-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed\-precision inference frameworks, and parameter\-efficient fine\-tuning (LoRA, QLoRA).
* Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
* Write production\-level, low\-latency, and memory\-safe C\+\+ and CUDA code for real\-time inference on vehicle systems.
### **Qualifications:**
* Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.
* Deep expertise in working with real\-time systems \& required constraints such as processing latency, memory utilization, and memory ban