Data Scientist
We’re looking for someone who thrives at the intersection of messy data, sharp analysis, and fast-moving product.
Most of the data we deal with isn’t clean—it lives in SharePoint drives, project databases, PDFs, Word documents, PowerPoints, and Excel sheets. Your job is to make sense of that chaos and turn it into structured insight that powers our AI systems.
You’ll work directly with the founders and engineers to design how data becomes intelligence—building the pipelines, evaluation frameworks, and review modules that make large language models useful and dependable. From spotting cross-document inconsistencies to defining what “good” looks like in a complex project, you’ll help shape the backbone of how our product learns and reasons.
What you’ll do
This role is about creating clarity out of complexity—structuring data, validating AI behavior, and enabling review modules that scale.
Build pipelines that transform PDFs, docs, spreadsheets, and database exports into analysis-ready formats
Design experiments to evaluate prompts, retrieval strategies, and cross-document reasoning
Develop benchmarks, golden datasets, and metrics to measure hallucination, consistency, and reliability
Encode expert domain knowledge into structured rules and testable review modules
Partner with AI and software engineers to embed data and evaluation into production workflows
Work with domain experts to capture lessons learned and translate them into reusable knowledge assets
Treat prompts, evaluations, and datasets as part of the same system—designed, tested, and versioned togetherWho you are
Who you are
You’re a builder who thinks in systems. You’re comfortable with ambiguity, but rigorous in how you test and measure. You like solving problems where the data is messy, the stakes are high, and the answers aren’t obvious.
You’ve built real data workflows—turning unstructured sources into usable pipelines
You’ve worked with LLMs, prompts, and retrieval systems in applied contexts
You understand evaluation: not just if a model works, but how to measure what “working” means
You’re fluent in Python and SQL, and familiar with tools like Dagster, Opik, LangChain, or equivalent
You’ve designed or run experiments—statistical, heuristic, or hybrid—and know how to get clear results
You can work across text, numbers, and structure—making sense of diverse formats and systems
You communicate clearly: your code, your metrics, and your reasoning are easy to follow
You are fluent in both Dutch and English, and can communicate confidently in writing and speech
You have at least 3 years of experience. What really matters: is your ability to learn fast, own big problems, and make our AI product smarter and more reliable every week.