AI Accelerator Software Principal Engineer – Runtime Library
Ampere Computing
Portland, OR, US
Onsite
2026-07-03
Announced salary
$182,000 - $273,000
Estimated net pay
$9,974 - $14,440
/month · 34% withheld
after tax & contributions · Single, no dependents
Job description
**Description**
**Invent the future with us.**
Ampere is a semiconductor design company for a new era, leading the future of computing with an innovative approach to CPU design focused on high\-performance, energy efficient AI compute.
As a pioneer in the new frontier of energy efficient high\-performance computing, Ampere is part of the Softbank Group of companies driving sustainable computing for AI, Cloud, and edge applications.
Join us at Ampere and work alongside a passionate and growing team \- we’d love to have you apply!
**About the Role**
As an AI Accelerator Principal Software Engineer – Runtime Library, you will lead the design, development, and optimization of AI runtime software that enables multiple state\-of\-the\-art deep learning models to run efficiently on Ampere’s deep learning accelerators. You will work at the intersection of systems software, performance engineering, and AI enablement, helping deliver high\-throughput, low\-latency inference and a strong foundation for future model and framework support.
**What You’ll Achieve:**
* **Build and evolve an AI Runtime Library** for Ampere accelerators that supports execution, scheduling, and lifecycle management of deep learning workloads across multiple model types and popular frameworks.
* **Own end\-to\-end acceleration paths**, going deep into the full SW/HW stack—including:
* + Inference serving and integration layers
+ Compiler/runtime interfaces and graph/IR execution flows
+ Runtime library architecture (APIs, memory management, operators, execution engines)
+ Communication mechanisms and device/host orchestration
* **Drive HW/SW co\-design and optimization** to improve:
* + Throughput (tokens/requests per second)
+ Latency (kernel execution and scheduling efficiency)
+ Memory efficiency (buffering, paging, reuse, caching)
+ Overall compute utilization and scaling behavior
* **Contribute to AI co\-processor/accelerator software en
On the map
map
See this employer on the map — Portland