Technology, DevOps/Site Reliability Engineer
BTIG, LLC
San Francisco, CA, US
Onsite
2026-07-02
Announced salary
$160,000 - $200,000
Low
$97K
Median
$124K
High
$159K
Market in San Francisco · BLS OEWS 2025
Estimated net pay
$9,173 - $11,221
/month · 31% withheld
after tax & contributions · Single, no dependents
Job description
**Job Purpose:**
BTIG seeks a DevOps/Site Reliability Engineer to join our technology team. This role is central to improving developer velocity by handling production application support escalations, managing and evolving our infrastructure stack, and providing operational continuity across the team. The ideal candidate thrives at the intersection of software operations, infrastructure engineering, and customer\-facing support and is energized by the opportunity to progressively take ownership of critical platform systems within a financial services environment.
In this role, you will be a force multiplier for a focused engineering team, the person who keeps production running smoothly, evolves the platform, and creates the space for developers to build. You'll gain broad, hands\-on exposure across the full stack in an environment where reliability directly impacts trading operations.
**Duties \& Responsibilities:**
* Serve as the primary point of contact for production application escalations — triage, diagnosis, and resolution across application components and services.
* Monitor application and infrastructure health; investigate anomalies and remediate without pulling developers off feature work.
* Develop and maintain runbooks, escalation procedures, and operational knowledge base documentation.
* Identify recurring issues and collaborate with developers to drive root\-cause fixes
* Own incident response and post\-incident review processes
* Manage, maintain, and improve the team's infrastructure stack:
o Reverse proxy \& traffic management
o Identity \& Access Management
o Secret/Configuration Management
o Certificate management
o SQL and No SQL Databases
o OTEL/Metrics, Traces, Logs \& Analytics
o Container Orchestration
o Event Streaming
* Automate provisioning, deployment, and configuration management
* Plan and execute upgrades, patches, security hardening, and capacity management
* Evolve infrastructure toward greater reli