Benjamin Berhault
Solo Founder & Full-Stack Engineer. Building 6 products across EdTech, HR-tech, Dating, GovTech, PropTech, and Data Infrastructure. Venture studio operator — own and operate, not sell.
Achievements
6 products built, launched, and scaled across diverse industries.
Real Estate Intelligence
languagePropTech · Data Platform
Interactive map platform scoring French property transactions as investment opportunities. Official DVF data, PostGIS analytics, vector tile visualization.
Impact Outcome
Building-level investment scoring across 2 departments, extensible to all 96 metropolitan French departments
Built investment scoring engine on official French government property data
French property investors rely on listings (biased) or gut feeling. Official transaction data (DVF) is public but raw and unusable without significant processing.
Built a full data pipeline: Dagster orchestration, dbt transforms, PostGIS analytics computing commune-level statistics (percentiles, medians, yield estimates). Generated vector tiles with Tippecanoe for MapLibre rendering. Scoring algorithm ranks every transaction 0-100 relative to its local market — a property at the 15th percentile in its commune scores 85 (opportunity).
DVF data has inconsistencies — missing coordinates, multi-lot transactions, price anomalies. Had to build robust cleaning pipelines before scoring was meaningful.
Relative scoring (vs local market) is far more actionable than absolute prices. A EUR150K apartment can be a great deal in one commune and overpriced in the next.
Scale: Millions of DVF transactions, building-level geospatial resolution
Kyros
open_in_newData Infrastructure · Open Source
Progressive, self-hosted data platform — from DuckDB on a laptop to full Spark/Kafka/Flink enterprise stack. 20+ composable services, 5 deployment levels. Apache 2.0.
Impact Outcome
20+ composable services, 5 deployment levels, 80% of Databricks at 5% of the cost
Designed progressive 5-level data platform architecture — DuckDB to Spark/Kafka
Small companies either overspend on Databricks/Snowflake (USD2-5K+/mo) or waste weeks assembling Docker services manually. No tool offered an honest progressive path.
Architected a composable platform with 5 deployment levels (Level 0: DuckDB+dbt at USD0/mo → Level 4: Spark+Kafka+Flink+Keycloak at USD150-300/mo). Built a Python CLI for interactive component selection, a docker-compose generator, and 20+ pre-configured services with health checks and resource limits. Apache 2.0 licensed.
Ensuring 20+ services work together reliably across 5 different configurations. Health checks, resource limits, and dependency ordering are critical and unglamorous work.
Most companies will never need Level 3+. The value is in giving them an honest framework to know when to upgrade — not pushing them to over-engineer from day one.
Scale: 20+ Docker services, 31 Dockerfiles, 5 preset environments
CMS Platform
constructionGovTech · Emergency Services
Real-time geospatial monitoring for emergency dispatch. C++ routing engine, Kafka streaming, GPS tracking, coverage analysis with building-level isochrones.
Impact Outcome
Building-level coverage analysis with real-time GPS tracking, sub-second route computations
Built C++ real-time routing engine for emergency services coverage analysis
French fire services (SDIS/BSPP) need to know if every building in their territory can be reached within response time targets. Existing tools are slow and batch-oriented.
Built a C++ routing engine using RoutingKit on OSM road graphs with contraction hierarchies, forbidden turns, and one-way streets. Kafka streams GPS positions in real-time from dispatch systems. Coverage isochrones computed per-building (not per-zone) with parallel workers. Frontend renders 31K+ buildings with viewport-based culling on MapLibre.
C++ binary distribution — cannot source-compile in CI without exposing RoutingKit internals. Had to pre-compile and distribute binaries.
GovTech sales cycles are 12-18 months minimum. The technical product can be ready in weeks, but the procurement process takes a year. Build relationships first, code second.
Scale: 31K+ buildings, real-time GPS streaming, parallel C++ workers
Jobko
open_in_newHR-tech · AI Job Matching
AI-powered, zero-friction job matching platform. Candidate-first, brutally honest analysis. Telegram-native with browser extension. Targeting the $500B job search market.
Impact Outcome
Sub-30 second time-to-value from URL paste to full analysis with match score, red flags, salary data, and tailored CV generation
Built multi-LLM job analysis pipeline processing 6 job board APIs
Job search platforms optimize for engagement, not outcomes. Candidates waste hours on poorly matched listings. No tool gave honest, instant feedback on fit.
Designed and built end-to-end: job aggregation from 6 APIs (France Travail, Indeed, LinkedIn, WTTJ, Adzuna, Jooble), multi-LLM routing (Claude, OpenAI, DeepSeek, Mistral) with cost optimization, pgvector semantic matching, real-time Telegram bot, Chrome extension, and full application tracking.
EUR200 total launch budget. Every LLM call costs real money — had to build smart routing to keep cost at ~USD0.02/analysis while maintaining quality.
Over-engineered the first version with microservices. Collapsed back to a monolith — shipping speed matters more than architecture purity at this stage.
Scale: 10K+ job analyses, 6 API integrations, 6 LLM providers
Zero-friction Telegram-first UX — no signup, no forms, instant value
Traditional job platforms require 10-30 minutes of setup (account, CV upload, preferences) before any value. Most candidates abandon during onboarding.
Built a Telegram bot where users paste a job URL and get a full analysis in 30 seconds — no account, no forms. Profile is built conversationally over time. Browser extension adds one-click analysis directly on LinkedIn/Indeed.
Telegram API rate limits, browser extension cross-origin restrictions, maintaining session state without traditional auth.
The biggest barrier to adoption is not features — it is friction. Removing the signup wall increased engagement dramatically.
Scale: Targeting 500 founding members
HestiaMatch
languageDating · Mobile App
Values-based dating app for family-oriented singles. React Native, 73 screens, proximity crossings, compatibility scoring, Stripe payments, matchmaker B2B mode.
Impact Outcome
73 screens, 96+ builds, cross-platform iOS/Android/Web, full Stripe integration
Shipped 96+ builds of a cross-platform dating app with values-based matching algorithm
Mainstream dating apps optimize for engagement (more swiping = more ads). Values-oriented singles — people looking for marriage, not hookups — are underserved.
Built a full React Native app from scratch: 73 screens, weighted compatibility scoring (values 25%, family goals 25%, financial philosophy 20%, intimacy 15%, communication 15%), proximity crossings (Happn-style), real-time WebSocket chat, Stripe payments, physical QR charm cards for offline-to-online conversion, and a B2B matchmaker dashboard.
React Native ecosystem moves fast — breaking changes between Expo versions, platform-specific bugs on iOS vs Android, App Store review requirements.
Dating apps are a distribution problem, not a technical one. Building the product is 20% of the challenge — getting users to trust and try a new platform is 80%.
Scale: 50+ services (~14K lines), 25+ reusable components
Questus-AI
open_in_newEdTech · SaaS
AI-powered flashcard & quiz platform. Upload documents, generate questions, learn with spaced repetition. 27+ content parsers, 6 LLM providers, 30+ languages.
Impact Outcome
40+ question types rendered, 30+ languages supported, cost per generation optimized by 70% via smart routing
Built 27+ AI content parsers with smart LLM routing across 6 providers
Existing flashcard tools (Anki, Quizlet) require manual card creation. AI-generated content was low quality and expensive due to single-provider lock-in.
Engineered a content ingestion pipeline supporting DOCX, PDF, PPTX, images (OCR), YouTube transcripts. Built 27+ pattern parsers (MCQ, scenario, fill-blank, matching, ordering) with confidence scoring. Implemented smart LLM routing across OpenAI, DeepSeek, Mistral, Claude, Grok, and Ollama — selecting model based on task complexity and cost.
Each LLM provider has different strengths — GPT-4 for nuance, DeepSeek for cost, Mistral for speed. Had to build a routing layer that balances quality vs cost per task type.
Token counting and cost tracking per user per provider was essential from day one — without it you cannot price a SaaS sustainably.
Scale: 635 source files, 6 LLM providers, 30+ languages
Built multi-LLM job analysis pipeline processing 6 job board APIs
Job search platforms optimize for engagement, not outcomes. Candidates waste hours on poorly matched listings. No tool gave honest, instant feedback on fit.
What I did
Designed and built end-to-end: job aggregation from 6 APIs (France Travail, Indeed, LinkedIn, WTTJ, Adzuna, Jooble), multi-LLM routing (Claude, OpenAI, DeepSeek, Mistral) with cost optimization, pgvector semantic matching, real-time Telegram bot, Chrome extension, and full application tracking.
Constraints
EUR200 total launch budget. Every LLM call costs real money — had to build smart routing to keep cost at ~USD0.02/analysis while maintaining quality.
What I Learned
Over-engineered the first version with microservices. Collapsed back to a monolith — shipping speed matters more than architecture purity at this stage.
Zero-friction Telegram-first UX — no signup, no forms, instant value
Traditional job platforms require 10-30 minutes of setup (account, CV upload, preferences) before any value. Most candidates abandon during onboarding.
What I did
Built a Telegram bot where users paste a job URL and get a full analysis in 30 seconds — no account, no forms. Profile is built conversationally over time. Browser extension adds one-click analysis directly on LinkedIn/Indeed.
Constraints
Telegram API rate limits, browser extension cross-origin restrictions, maintaining session state without traditional auth.
What I Learned
The biggest barrier to adoption is not features — it is friction. Removing the signup wall increased engagement dramatically.
Built 27+ AI content parsers with smart LLM routing across 6 providers
Existing flashcard tools (Anki, Quizlet) require manual card creation. AI-generated content was low quality and expensive due to single-provider lock-in.
What I did
Engineered a content ingestion pipeline supporting DOCX, PDF, PPTX, images (OCR), YouTube transcripts. Built 27+ pattern parsers (MCQ, scenario, fill-blank, matching, ordering) with confidence scoring. Implemented smart LLM routing across OpenAI, DeepSeek, Mistral, Claude, Grok, and Ollama — selecting model based on task complexity and cost.
Constraints
Each LLM provider has different strengths — GPT-4 for nuance, DeepSeek for cost, Mistral for speed. Had to build a routing layer that balances quality vs cost per task type.
What I Learned
Token counting and cost tracking per user per provider was essential from day one — without it you cannot price a SaaS sustainably.
Shipped 96+ builds of a cross-platform dating app with values-based matching algorithm
Mainstream dating apps optimize for engagement (more swiping = more ads). Values-oriented singles — people looking for marriage, not hookups — are underserved.
What I did
Built a full React Native app from scratch: 73 screens, weighted compatibility scoring (values 25%, family goals 25%, financial philosophy 20%, intimacy 15%, communication 15%), proximity crossings (Happn-style), real-time WebSocket chat, Stripe payments, physical QR charm cards for offline-to-online conversion, and a B2B matchmaker dashboard.
Constraints
React Native ecosystem moves fast — breaking changes between Expo versions, platform-specific bugs on iOS vs Android, App Store review requirements.
What I Learned
Dating apps are a distribution problem, not a technical one. Building the product is 20% of the challenge — getting users to trust and try a new platform is 80%.
Built investment scoring engine on official French government property data
French property investors rely on listings (biased) or gut feeling. Official transaction data (DVF) is public but raw and unusable without significant processing.
What I did
Built a full data pipeline: Dagster orchestration, dbt transforms, PostGIS analytics computing commune-level statistics (percentiles, medians, yield estimates). Generated vector tiles with Tippecanoe for MapLibre rendering. Scoring algorithm ranks every transaction 0-100 relative to its local market — a property at the 15th percentile in its commune scores 85 (opportunity).
Constraints
DVF data has inconsistencies — missing coordinates, multi-lot transactions, price anomalies. Had to build robust cleaning pipelines before scoring was meaningful.
What I Learned
Relative scoring (vs local market) is far more actionable than absolute prices. A EUR150K apartment can be a great deal in one commune and overpriced in the next.
Designed progressive 5-level data platform architecture — DuckDB to Spark/Kafka
Small companies either overspend on Databricks/Snowflake (USD2-5K+/mo) or waste weeks assembling Docker services manually. No tool offered an honest progressive path.
What I did
Architected a composable platform with 5 deployment levels (Level 0: DuckDB+dbt at USD0/mo → Level 4: Spark+Kafka+Flink+Keycloak at USD150-300/mo). Built a Python CLI for interactive component selection, a docker-compose generator, and 20+ pre-configured services with health checks and resource limits. Apache 2.0 licensed.
Constraints
Ensuring 20+ services work together reliably across 5 different configurations. Health checks, resource limits, and dependency ordering are critical and unglamorous work.
What I Learned
Most companies will never need Level 3+. The value is in giving them an honest framework to know when to upgrade — not pushing them to over-engineer from day one.
Built C++ real-time routing engine for emergency services coverage analysis
French fire services (SDIS/BSPP) need to know if every building in their territory can be reached within response time targets. Existing tools are slow and batch-oriented.
What I did
Built a C++ routing engine using RoutingKit on OSM road graphs with contraction hierarchies, forbidden turns, and one-way streets. Kafka streams GPS positions in real-time from dispatch systems. Coverage isochrones computed per-building (not per-zone) with parallel workers. Frontend renders 31K+ buildings with viewport-based culling on MapLibre.
Constraints
C++ binary distribution — cannot source-compile in CI without exposing RoutingKit internals. Had to pre-compile and distribute binaries.
What I Learned
GovTech sales cycles are 12-18 months minimum. The technical product can be ready in weeks, but the procurement process takes a year. Build relationships first, code second.
Technical Expertise
Level: Expert
Level: Solid
Level: Familiar
Timeline
Independent Data Engineer
Independent Consultant
Data Engineer
Energy sector client (PoC)
Data Engineer
City of Lausanne (data synchronization platform)
Data Engineer
ICRC (Resolve / Family Visit Program / Red Loop)
Data Architect
Brigade de sapeurs-pompiers de Paris (BSPP)
Data Warehouse Developer / BI Developer
Brigade de sapeurs-pompiers de Paris (BSPP)