Software Engineer, Multimodal Storage Infrastructure

Eventualcomputing

San Francisco

Hybrid 2026-07-04

Announced salary

$150,000 - $250,000

Low

$119K

Median

$166K

High

$214K

Market in San Francisco · BLS OEWS 2025

Open in iampro arrow_forward Apply open_in_new

Job description

ABOUT EVENTUAL Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics. They don't know how to index a clip by content, co-locate sensors on the same row as video, version multimodal datasets, or push predicates down to a corpus of MP4s. Robotics and video-AI teams build the missing layer themselves: stitching together five to eight tools, organizing disorganized video and sensor data, building schemas and versioning that don't exist. "It was rebuilding what Databricks built 15 years ago for analytics — just for AI data." Eventual was founded in 2022 to ship that layer once. Our open-source engine, Daft https://daft.ai/, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a multimodal warehouse on top of our engine for Physical AI: video, sensors, and sim outputs co-indexed on the same row, aligned on timecode, and versioned — with a content-aware query layer on top. We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc http://Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next. Join our small (but powerful!) team working together 4 days/week in our SF Mission district office. YOUR ROLE As a Storage Infrastructure Engineer, you'll take everything we know about modern databases and apply it to the world of Physical AI. Our warehouse co-indexes video, sensors, embeddings, and sim outputs on the same row, versioned, with a third query layer (not row/column, not vector/semantic) — content-aware queries over what's inside clips. Your job is to make that layer fast: the right indices for petabyte-scale video, predicate pushdowns that elide whole files, file formats that respect random access into clips, and a query path that turns "left-arm grasp failures on deformable objects" into the smallest possible read. You should believe, in your bones, that the best read is the read elided. KEY RESPONSIBILITIES - Design and build the storage and indexing layer: row groups, column chunks, secondary indices, vector indices, and the metadata that lets queries skip everything that doesn't matter. - Push the query engine harder — predicate pushdown, projection pushdown, late materialization — across multimodal columns including video, embeddings, and sensor streams. - Choose, extend, or build on top of modern open formats (Parquet, Iceberg, Delta etc) and build our own/contribute upstream where it makes sense. - Build versioning and schema evolution for multimodal datasets so customer data stays reproducible across months of experimentation. - Partner with the Dataloading team on the format-to-loader boundary so an iceberg.scan(...) translates into the absolute minimum of bytes hitting NVMe. - Partner with the Visual Understanding team to land model outputs in the index without an external glue layer. WHAT WE LOOK FOR - You love thinking about indices. B+ trees, LSM trees, bitmap indices, vector indices, learned indices — you have favorites and you have grudges. - You love thinking about query engines. Predicate pushdown makes you happy. Late materialization makes you happier. - Strong familiarity with the storage hierarchy: cloud object stores, NVMe, block storage, spinning disk, RAM, GPU memory — and the latency and cost of moving between them. - Strong opinions about Parquet — love it or hate it, you've earned the opinion. Same for Iceberg, Delta, Lance, and the other lakehouse formats. - A real love for databases and query systems. You read database papers for fun. - You believe the best read is the read elided. NICE TO HAVE - Background from a storage or table-format team — Lance, Iceberg, Delta, Hudi, Spiral, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse, or similar. - You've attempted to build your own database before. Or, at minimum, fantasized about it in detail. - Experience with Rust or modern C++ for storage engines. - Hands-on time with vector indices (HNSW, IVF, SCANN) or hybrid retrieval systems. - Comfort with the OLAP/lakehouse ecosystem: catalogs, file layout, compaction, manifest formats, time travel. PERKS & BENEFITS - In-person, tight-knit team — 4 days/week in our SF Mission office. - Competitive comp and meaningful startup equity. - Catered lunches and dinners for SF employees. - Commuter benefit. - Team-building events and poker nights. - Health, vision, and dental coverage. - Flexible PTO. - Latest Apple equipment. - 401(k) plan with match. If you've ever read a Parquet footer for fun and thought "this is so close to what video needs, but yet so far" — we should talk.

← See all Software Engineer · San Francisco