System Development Engineer, Elastic Disaster Recovery, AWS Elastic Disaster Recovery

Amazon Web Services

Boston, MA, US

Onsite 2026-06-30

Announced salary

$148,700 - $201,200

Low

$113K

Median

$147K

High

$191K

Market in Boston · BLS OEWS 2025

Estimated net pay

$8,816 - $11,673

/month · 29% withheld

after tax & contributions · Single, no dependents

Your situation Children

Open in iampro arrow_forward Apply open_in_new

Job description

**DESCRIPTION** --------------- We are looking for a Systems Development Engineer to build the automation, tooling, and operational infrastructure that keep this large\-scale, mission\-critical service reliable, secure, and efficient. In this role you will treat operations as a software problem — eliminating manual toil, hardening our deployment and monitoring systems, and ensuring our replication and recovery fleet runs flawlessly across a broad and heterogeneous environment. A key dimension of this role is breadth: DRS supports a wide range of operating systems (multiple Linux distributions and Windows versions) and both x86/64 and ARM64 (Graviton) architectures, so your automation and tooling must be robust across diverse OS and hardware combinations. Key job responsibilities * Operational automation: Design and build software that automates infrastructure provisioning, deployments, and recurring operational workflows, reducing manual effort and on\-call burden across the DRS fleet. * CI/CD and deployment safety: Build and improve pipelines, deployment guardrails, and rollback mechanisms to ship changes safely across all regions and platform variants. * Cross\-platform support: Develop and maintain tooling that works reliably across a wide range of operating systems (various Linux distributions and Windows) and both x86/64 and ARM64 (Graviton) architectures. * Monitoring and resilience: Implement monitoring, alarming, and self\-healing systems to detect and remediate issues before they impact customers' replication and recovery operations. * Scaling and performance: Tune and scale the systems behind continuous replication, capacity management, and recovery orchestration to handle growth gracefully. * Operational excellence: Drive down ticket and incident volume through durable, programmatic fixes; lead root\-cause analysis and contribute to runbooks and operational best practices. * Security and compliance: Partner with security teams to harden the se

← See all Cloud Architect · Boston