System Development Engineer, Elastic Disaster Recovery, AWS Elastic Disaster Recovery
Amazon Web Services
Boston, MA, US
Onsite
2026-06-30
Announced salary
$148,700 - $201,200
Low
$113K
Median
$147K
High
$191K
Market in Boston · BLS OEWS 2025
Estimated net pay
$8,816 - $11,673
/month · 29% withheld
after tax & contributions · Single, no dependents
Job description
**DESCRIPTION**
---------------
We are looking for a Systems Development Engineer to build the automation, tooling, and operational infrastructure that keep this large\-scale, mission\-critical service reliable, secure, and efficient. In this role you will treat operations as a software problem — eliminating manual toil, hardening our deployment and monitoring systems, and ensuring our replication and recovery fleet runs flawlessly across a broad and heterogeneous environment. A key dimension of this role is breadth: DRS supports a wide range of operating systems (multiple Linux distributions and Windows versions) and both x86/64 and ARM64 (Graviton) architectures, so your automation and tooling must be robust across diverse OS and hardware combinations.
Key job responsibilities
* Operational automation: Design and build software that automates infrastructure provisioning, deployments, and recurring operational workflows, reducing manual effort and on\-call burden across the DRS fleet.
* CI/CD and deployment safety: Build and improve pipelines, deployment guardrails, and rollback mechanisms to ship changes safely across all regions and platform variants.
* Cross\-platform support: Develop and maintain tooling that works reliably across a wide range of operating systems (various Linux distributions and Windows) and both x86/64 and ARM64 (Graviton) architectures.
* Monitoring and resilience: Implement monitoring, alarming, and self\-healing systems to detect and remediate issues before they impact customers' replication and recovery operations.
* Scaling and performance: Tune and scale the systems behind continuous replication, capacity management, and recovery orchestration to handle growth gracefully.
* Operational excellence: Drive down ticket and incident volume through durable, programmatic fixes; lead root\-cause analysis and contribute to runbooks and operational best practices.
* Security and compliance: Partner with security teams to harden the se