AI Accelerator Software Principal Engineer – Runtime Library

Ampere Computing

Portland, OR, US

Onsite 2026-07-03

Announced salary

$182,000 - $273,000

Estimated net pay

$9,974 - $14,440

/month · 34% withheld

after tax & contributions · Single, no dependents

Your situation Children

Open in iampro arrow_forward Apply open_in_new

Job description

**Description** **Invent the future with us.** Ampere is a semiconductor design company for a new era, leading the future of computing with an innovative approach to CPU design focused on high\-performance, energy efficient AI compute. As a pioneer in the new frontier of energy efficient high\-performance computing, Ampere is part of the Softbank Group of companies driving sustainable computing for AI, Cloud, and edge applications. Join us at Ampere and work alongside a passionate and growing team \- we’d love to have you apply! **About the Role** As an AI Accelerator Principal Software Engineer – Runtime Library, you will lead the design, development, and optimization of AI runtime software that enables multiple state\-of\-the\-art deep learning models to run efficiently on Ampere’s deep learning accelerators. You will work at the intersection of systems software, performance engineering, and AI enablement, helping deliver high\-throughput, low\-latency inference and a strong foundation for future model and framework support. **What You’ll Achieve:** * **Build and evolve an AI Runtime Library** for Ampere accelerators that supports execution, scheduling, and lifecycle management of deep learning workloads across multiple model types and popular frameworks. * **Own end\-to\-end acceleration paths**, going deep into the full SW/HW stack—including: * + Inference serving and integration layers + Compiler/runtime interfaces and graph/IR execution flows + Runtime library architecture (APIs, memory management, operators, execution engines) + Communication mechanisms and device/host orchestration * **Drive HW/SW co\-design and optimization** to improve: * + Throughput (tokens/requests per second) + Latency (kernel execution and scheduling efficiency) + Memory efficiency (buffering, paging, reuse, caching) + Overall compute utilization and scaling behavior * **Contribute to AI co\-processor/accelerator software en

On the map

map

See this employer on the map — Portland