Annecy, France

Benjamin Berhault

Solo Founder & Full-Stack Engineer. Building 6 products across EdTech, HR-tech, Dating, GovTech, PropTech, and Data Infrastructure. Venture studio operator — own and operate, not sell.

Achievements

6 products built, launched, and scaled across diverse industries.

Real Estate Intelligence

language

PropTech · Data Platform

Live
expand_more

Interactive map platform scoring French property transactions as investment opportunities. Official DVF data, PostGIS analytics, vector tile visualization.

Python Dagster dbt PostGIS MapLibre D3.js Docker

Impact Outcome

Building-level investment scoring across 2 departments, extensible to all 96 metropolitan French departments

Built investment scoring engine on official French government property data

Problem

French property investors rely on listings (biased) or gut feeling. Official transaction data (DVF) is public but raw and unusable without significant processing.

What I did

Built a full data pipeline: Dagster orchestration, dbt transforms, PostGIS analytics computing commune-level statistics (percentiles, medians, yield estimates). Generated vector tiles with Tippecanoe for MapLibre rendering. Scoring algorithm ranks every transaction 0-100 relative to its local market — a property at the 15th percentile in its commune scores 85 (opportunity).

Key Constraint

DVF data has inconsistencies — missing coordinates, multi-lot transactions, price anomalies. Had to build robust cleaning pipelines before scoring was meaningful.

Lesson Learned

Relative scoring (vs local market) is far more actionable than absolute prices. A EUR150K apartment can be a great deal in one commune and overpriced in the next.

Scale: Millions of DVF transactions, building-level geospatial resolution

Data Infrastructure · Open Source

Live
expand_more

Progressive, self-hosted data platform — from DuckDB on a laptop to full Spark/Kafka/Flink enterprise stack. 20+ composable services, 5 deployment levels. Apache 2.0.

Docker Dagster dbt Spark Kafka Flink PostgreSQL MinIO Superset

Impact Outcome

20+ composable services, 5 deployment levels, 80% of Databricks at 5% of the cost

Designed progressive 5-level data platform architecture — DuckDB to Spark/Kafka

Problem

Small companies either overspend on Databricks/Snowflake (USD2-5K+/mo) or waste weeks assembling Docker services manually. No tool offered an honest progressive path.

What I did

Architected a composable platform with 5 deployment levels (Level 0: DuckDB+dbt at USD0/mo → Level 4: Spark+Kafka+Flink+Keycloak at USD150-300/mo). Built a Python CLI for interactive component selection, a docker-compose generator, and 20+ pre-configured services with health checks and resource limits. Apache 2.0 licensed.

Key Constraint

Ensuring 20+ services work together reliably across 5 different configurations. Health checks, resource limits, and dependency ordering are critical and unglamorous work.

Lesson Learned

Most companies will never need Level 3+. The value is in giving them an honest framework to know when to upgrade — not pushing them to over-engineer from day one.

Scale: 20+ Docker services, 31 Dockerfiles, 5 preset environments

CMS Platform

construction

GovTech · Emergency Services

Building
expand_more

Real-time geospatial monitoring for emergency dispatch. C++ routing engine, Kafka streaming, GPS tracking, coverage analysis with building-level isochrones.

C++ Kafka Redis FastAPI MapLibre Deck.gl PostGIS Docker

Impact Outcome

Building-level coverage analysis with real-time GPS tracking, sub-second route computations

Built C++ real-time routing engine for emergency services coverage analysis

Problem

French fire services (SDIS/BSPP) need to know if every building in their territory can be reached within response time targets. Existing tools are slow and batch-oriented.

What I did

Built a C++ routing engine using RoutingKit on OSM road graphs with contraction hierarchies, forbidden turns, and one-way streets. Kafka streams GPS positions in real-time from dispatch systems. Coverage isochrones computed per-building (not per-zone) with parallel workers. Frontend renders 31K+ buildings with viewport-based culling on MapLibre.

Key Constraint

C++ binary distribution — cannot source-compile in CI without exposing RoutingKit internals. Had to pre-compile and distribute binaries.

Lesson Learned

GovTech sales cycles are 12-18 months minimum. The technical product can be ready in weeks, but the procurement process takes a year. Build relationships first, code second.

Scale: 31K+ buildings, real-time GPS streaming, parallel C++ workers

HR-tech · AI Job Matching

Live
expand_more

AI-powered, zero-friction job matching platform. Candidate-first, brutally honest analysis. Telegram-native with browser extension. Targeting the $500B job search market.

Python FastAPI PostgreSQL Redis Claude AI Tailwind CSS Stripe Telegram Bot pgvector

Impact Outcome

Sub-30 second time-to-value from URL paste to full analysis with match score, red flags, salary data, and tailored CV generation

Built multi-LLM job analysis pipeline processing 6 job board APIs

Problem

Job search platforms optimize for engagement, not outcomes. Candidates waste hours on poorly matched listings. No tool gave honest, instant feedback on fit.

What I did

Designed and built end-to-end: job aggregation from 6 APIs (France Travail, Indeed, LinkedIn, WTTJ, Adzuna, Jooble), multi-LLM routing (Claude, OpenAI, DeepSeek, Mistral) with cost optimization, pgvector semantic matching, real-time Telegram bot, Chrome extension, and full application tracking.

Key Constraint

EUR200 total launch budget. Every LLM call costs real money — had to build smart routing to keep cost at ~USD0.02/analysis while maintaining quality.

Lesson Learned

Over-engineered the first version with microservices. Collapsed back to a monolith — shipping speed matters more than architecture purity at this stage.

Scale: 10K+ job analyses, 6 API integrations, 6 LLM providers

Zero-friction Telegram-first UX — no signup, no forms, instant value

Problem

Traditional job platforms require 10-30 minutes of setup (account, CV upload, preferences) before any value. Most candidates abandon during onboarding.

What I did

Built a Telegram bot where users paste a job URL and get a full analysis in 30 seconds — no account, no forms. Profile is built conversationally over time. Browser extension adds one-click analysis directly on LinkedIn/Indeed.

Key Constraint

Telegram API rate limits, browser extension cross-origin restrictions, maintaining session state without traditional auth.

Lesson Learned

The biggest barrier to adoption is not features — it is friction. Removing the signup wall increased engagement dramatically.

Scale: Targeting 500 founding members

HestiaMatch

language

Dating · Mobile App

Live
expand_more

Values-based dating app for family-oriented singles. React Native, 73 screens, proximity crossings, compatibility scoring, Stripe payments, matchmaker B2B mode.

React Native Expo Stripe WebSocket REST API i18n

Impact Outcome

73 screens, 96+ builds, cross-platform iOS/Android/Web, full Stripe integration

Shipped 96+ builds of a cross-platform dating app with values-based matching algorithm

Problem

Mainstream dating apps optimize for engagement (more swiping = more ads). Values-oriented singles — people looking for marriage, not hookups — are underserved.

What I did

Built a full React Native app from scratch: 73 screens, weighted compatibility scoring (values 25%, family goals 25%, financial philosophy 20%, intimacy 15%, communication 15%), proximity crossings (Happn-style), real-time WebSocket chat, Stripe payments, physical QR charm cards for offline-to-online conversion, and a B2B matchmaker dashboard.

Key Constraint

React Native ecosystem moves fast — breaking changes between Expo versions, platform-specific bugs on iOS vs Android, App Store review requirements.

Lesson Learned

Dating apps are a distribution problem, not a technical one. Building the product is 20% of the challenge — getting users to trust and try a new platform is 80%.

Scale: 50+ services (~14K lines), 25+ reusable components

Questus-AI

open_in_new

EdTech · SaaS

Live
expand_more

AI-powered flashcard & quiz platform. Upload documents, generate questions, learn with spaced repetition. 27+ content parsers, 6 LLM providers, 30+ languages.

Python FastAPI PostgreSQL Redis Elasticsearch Vite HTMX OpenAI Docker

Impact Outcome

40+ question types rendered, 30+ languages supported, cost per generation optimized by 70% via smart routing

Built 27+ AI content parsers with smart LLM routing across 6 providers

Problem

Existing flashcard tools (Anki, Quizlet) require manual card creation. AI-generated content was low quality and expensive due to single-provider lock-in.

What I did

Engineered a content ingestion pipeline supporting DOCX, PDF, PPTX, images (OCR), YouTube transcripts. Built 27+ pattern parsers (MCQ, scenario, fill-blank, matching, ordering) with confidence scoring. Implemented smart LLM routing across OpenAI, DeepSeek, Mistral, Claude, Grok, and Ollama — selecting model based on task complexity and cost.

Key Constraint

Each LLM provider has different strengths — GPT-4 for nuance, DeepSeek for cost, Mistral for speed. Had to build a routing layer that balances quality vs cost per task type.

Lesson Learned

Token counting and cost tracking per user per provider was essential from day one — without it you cannot price a SaaS sustainably.

Scale: 635 source files, 6 LLM providers, 30+ languages

Solo Venture · Solo

Built multi-LLM job analysis pipeline processing 6 job board APIs

Job search platforms optimize for engagement, not outcomes. Candidates waste hours on poorly matched listings. No tool gave honest, instant feedback on fit.

Result Sub-30 second time-to-value from URL paste to full analysis with match score, red flags, salary data, and tailored CV generation
Impact Technical · product

What I did

Designed and built end-to-end: job aggregation from 6 APIs (France Travail, Indeed, LinkedIn, WTTJ, Adzuna, Jooble), multi-LLM routing (Claude, OpenAI, DeepSeek, Mistral) with cost optimization, pgvector semantic matching, real-time Telegram bot, Chrome extension, and full application tracking.

Constraints

EUR200 total launch budget. Every LLM call costs real money — had to build smart routing to keep cost at ~USD0.02/analysis while maintaining quality.

What I Learned

Over-engineered the first version with microservices. Collapsed back to a monolith — shipping speed matters more than architecture purity at this stage.

10K+ job analyses, 6 API integrations, 6 LLM providers Hr Tech Python FastAPI PostgreSQL pgvector Redis Claude Stripe Telegram
2024–present
Solo Venture · Solo

Zero-friction Telegram-first UX — no signup, no forms, instant value

Traditional job platforms require 10-30 minutes of setup (account, CV upload, preferences) before any value. Most candidates abandon during onboarding.

Result Zero-to-value in under 60 seconds vs industry standard of 10-30 minutes
Impact Efficiency · user_experience

What I did

Built a Telegram bot where users paste a job URL and get a full analysis in 30 seconds — no account, no forms. Profile is built conversationally over time. Browser extension adds one-click analysis directly on LinkedIn/Indeed.

Constraints

Telegram API rate limits, browser extension cross-origin restrictions, maintaining session state without traditional auth.

What I Learned

The biggest barrier to adoption is not features — it is friction. Removing the signup wall increased engagement dramatically.

Targeting 500 founding members Hr Tech Telegram Bot API Chrome Extension FastAPI Redis
2024–present
Solo Venture · Solo

Built 27+ AI content parsers with smart LLM routing across 6 providers

Existing flashcard tools (Anki, Quizlet) require manual card creation. AI-generated content was low quality and expensive due to single-provider lock-in.

Result 40+ question types rendered, 30+ languages supported, cost per generation optimized by 70% via smart routing
Impact Technical · product

What I did

Engineered a content ingestion pipeline supporting DOCX, PDF, PPTX, images (OCR), YouTube transcripts. Built 27+ pattern parsers (MCQ, scenario, fill-blank, matching, ordering) with confidence scoring. Implemented smart LLM routing across OpenAI, DeepSeek, Mistral, Claude, Grok, and Ollama — selecting model based on task complexity and cost.

Constraints

Each LLM provider has different strengths — GPT-4 for nuance, DeepSeek for cost, Mistral for speed. Had to build a routing layer that balances quality vs cost per task type.

What I Learned

Token counting and cost tracking per user per provider was essential from day one — without it you cannot price a SaaS sustainably.

635 source files, 6 LLM providers, 30+ languages Edtech Python FastAPI PostgreSQL OpenAI Tesseract OCR tiktoken Vite HTMX
2024–present
Solo Venture · Solo

Shipped 96+ builds of a cross-platform dating app with values-based matching algorithm

Mainstream dating apps optimize for engagement (more swiping = more ads). Values-oriented singles — people looking for marriage, not hookups — are underserved.

Result 73 screens, 96+ builds, cross-platform iOS/Android/Web, full Stripe integration
Impact Technical · product

What I did

Built a full React Native app from scratch: 73 screens, weighted compatibility scoring (values 25%, family goals 25%, financial philosophy 20%, intimacy 15%, communication 15%), proximity crossings (Happn-style), real-time WebSocket chat, Stripe payments, physical QR charm cards for offline-to-online conversion, and a B2B matchmaker dashboard.

Constraints

React Native ecosystem moves fast — breaking changes between Expo versions, platform-specific bugs on iOS vs Android, App Store review requirements.

What I Learned

Dating apps are a distribution problem, not a technical one. Building the product is 20% of the challenge — getting users to trust and try a new platform is 80%.

50+ services (~14K lines), 25+ reusable components Dating React Native Expo Stripe WebSocket i18n
2024–present
Solo Venture · Solo

Built investment scoring engine on official French government property data

French property investors rely on listings (biased) or gut feeling. Official transaction data (DVF) is public but raw and unusable without significant processing.

Result Building-level investment scoring across 2 departments, extensible to all 96 metropolitan French departments
Impact Technical · product

What I did

Built a full data pipeline: Dagster orchestration, dbt transforms, PostGIS analytics computing commune-level statistics (percentiles, medians, yield estimates). Generated vector tiles with Tippecanoe for MapLibre rendering. Scoring algorithm ranks every transaction 0-100 relative to its local market — a property at the 15th percentile in its commune scores 85 (opportunity).

Constraints

DVF data has inconsistencies — missing coordinates, multi-lot transactions, price anomalies. Had to build robust cleaning pipelines before scoring was meaningful.

What I Learned

Relative scoring (vs local market) is far more actionable than absolute prices. A EUR150K apartment can be a great deal in one commune and overpriced in the next.

Millions of DVF transactions, building-level geospatial resolution Proptech Dagster dbt PostGIS MapLibre Tippecanoe D3.js FastAPI
2026–present
Solo Venture · Solo

Designed progressive 5-level data platform architecture — DuckDB to Spark/Kafka

Small companies either overspend on Databricks/Snowflake (USD2-5K+/mo) or waste weeks assembling Docker services manually. No tool offered an honest progressive path.

Result 20+ composable services, 5 deployment levels, 80% of Databricks at 5% of the cost
Impact Efficiency · infrastructure

What I did

Architected a composable platform with 5 deployment levels (Level 0: DuckDB+dbt at USD0/mo → Level 4: Spark+Kafka+Flink+Keycloak at USD150-300/mo). Built a Python CLI for interactive component selection, a docker-compose generator, and 20+ pre-configured services with health checks and resource limits. Apache 2.0 licensed.

Constraints

Ensuring 20+ services work together reliably across 5 different configurations. Health checks, resource limits, and dependency ordering are critical and unglamorous work.

What I Learned

Most companies will never need Level 3+. The value is in giving them an honest framework to know when to upgrade — not pushing them to over-engineer from day one.

20+ Docker services, 31 Dockerfiles, 5 preset environments Data Infrastructure Docker Dagster dbt Spark Kafka Flink Trino Superset Keycloak
2025–present
Solo Venture · Solo

Built C++ real-time routing engine for emergency services coverage analysis

French fire services (SDIS/BSPP) need to know if every building in their territory can be reached within response time targets. Existing tools are slow and batch-oriented.

Result Building-level coverage analysis with real-time GPS tracking, sub-second route computations
Impact Technical · product

What I did

Built a C++ routing engine using RoutingKit on OSM road graphs with contraction hierarchies, forbidden turns, and one-way streets. Kafka streams GPS positions in real-time from dispatch systems. Coverage isochrones computed per-building (not per-zone) with parallel workers. Frontend renders 31K+ buildings with viewport-based culling on MapLibre.

Constraints

C++ binary distribution — cannot source-compile in CI without exposing RoutingKit internals. Had to pre-compile and distribute binaries.

What I Learned

GovTech sales cycles are 12-18 months minimum. The technical product can be ready in weeks, but the procurement process takes a year. Build relationships first, code second.

31K+ buildings, real-time GPS streaming, parallel C++ workers Govtech C++14 RoutingKit Kafka Redis FastAPI MapLibre Deck.gl Protobuf
2025–present

Technical Expertise

Level: Expert

SQL verified
Microsoft SQL Server verified
SSIS verified
Data Warehouse Design verified
ETL/ELT verified
Power BI verified
Apache Kafka verified
Redis verified
Data Integration Architecture verified
Geospatial Algorithms verified
Real-time Data Pipelines verified

Level: Solid

Python PostgreSQL Docker Git GitLab CI/CD Linux Debezium (CDC) FastAPI Flask C++ (Modern C++17) PostGIS JavaScript (ES6+) MapLibre GL JS Leaflet Server-Sent Events (SSE)

Level: Familiar

React Native Cloudflare Azure DevOps PowerShell Bash LLM integration (OpenAI/DeepSeek) Vector Tiles GeoJSON Protocol Buffers

Timeline

2024–now

Independent Data Engineer

Independent Consultant

2024

Data Engineer

Energy sector client (PoC)

2023–2024

Data Engineer

City of Lausanne (data synchronization platform)

2021–2023

Data Engineer

ICRC (Resolve / Family Visit Program / Red Loop)

2019–2021

Data Architect

Brigade de sapeurs-pompiers de Paris (BSPP)

2017–2019

Data Warehouse Developer / BI Developer

Brigade de sapeurs-pompiers de Paris (BSPP)

Let's build.

Send an Email arrow_forward benjamin.berhault@hotmail.com