Senior/Staff Software Engineer - Machine Learning & System Optimization

Zoox

Boston, MA, US

Onsite 2026-07-01

Announced salary

$226,000 - $307,000

Low

$105K

Median

$147K

High

$189K

Market in Boston · BLS OEWS 2025

Estimated net pay

$13,037 - $17,044

/month · 31% withheld

after tax & contributions · Single, no dependents

Your situation Children

Open in iampro arrow_forward Apply open_in_new

Job description

The Perception team is pioneering the development of a multi\-modality foundation model to drive the next generation of autonomous system intelligence. As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on\-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another. You will focus on bringing highly efficient, production\-ready large\-scale models to our on\-vehicle stack. We are looking for experts with hands\-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power\- and thermal\-constrained vehicle SoCs. In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real\-time, deterministic execution on edge devices. ### **In this role, you will:** * Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot. * Spearhead cross\-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies. * Optimize large\-scale models (Multi\-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed\-precision inference frameworks, and parameter\-efficient fine\-tuning (LoRA, QLoRA). * Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment. * Write production\-level, low\-latency, and memory\-safe C\+\+ and CUDA code for real\-time inference on vehicle systems. ### **Qualifications:** * Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput. * Deep expertise in working with real\-time systems \& required constraints such as processing latency, memory utilization, and memory ban

← See all Software Engineer · Boston