Staff Machine Learning Engineer

Unity Technologies

Sacramento, CA, US

Onsite 2026-07-02

Announced salary

$167,200 - $250,800

Low

$85K

Median

$112K

High

$148K

Market in Sacramento · BLS OEWS 2025

Estimated net pay

$9,527 - $13,726

/month · 32% withheld

after tax & contributions · Single, no dependents

Your situation Children

Open in iampro arrow_forward Apply open_in_new

Job description

**The opportunity** We are building the next generation of AI\-driven game experiences, running generative models on\-device, right where the players are — on phones, tablets, laptops, and desktops. Our games run inside a modern, browser\-native runtime (built on technologies such as WebGPU and WebNN), so the models that power these experiences must be deployed and accelerated entirely within that runtime. As a Senior Machine Learning Engineer for On\-Device \& Mobile AI, you will take state\-of\-the\-art multi\-modal models — transformers, diffusion networks, and vision\-language models (VLMs) — and make them run **fast, small,** **and reliably** on mobile and constrained hardware. This is a deeply hands\-on role. You will own the optimization and deployment of significant parts of the inference stack — from a trained checkpoint leaving research, through export, quantization, and kernel\-level tuning, to a shipped feature running inside the engine at interactive frame rates within a fixed memory and power budget. Your work directly shapes the latency, quality, memory footprint, and battery profile of AI features experienced by billions of players. This role is for an engineer who is energized by the gap between a research model and a shipping, on\-device product. If you enjoy profilers, frame captures, op\-fusion, and shaving milliseconds and megabytes, this is your role. **What you'll be doing** * Inference \& On\-Device Optimization * Own the optimization pipeline for the models you ship: model export, graph transformation, operator fusion, memory\-layout planning, and hardware\-specific tuning across NPU, mobile GPU, and desktop/laptop GPU. * Apply quantization (INT4/INT8/FP16\), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars. * Do low\-level performance work: write and tune WebGPU compute shaders (WGSL) and, where relevant, native ke

On the map

map

See this employer on the map — Sacramento