Stellenbeschreibung

About the company

Finom is a European technology startup based in Amsterdam, building an integrated financial platform for entrepreneurs. The product combines banking, accounting, financial administration, and invoicing into one mobile-first experience designed to make business finances simpler and more efficient.

The company has recently raised €115 million in Series C equity funding, adding to a total of about $346 million. That round came after a $105 million growth investment from General Catalyst, an investor that has supported the company since 2021 and is also known for backing firms such as Airbnb, HubSpot, KAYAK, and Stripe.

Beyond core banking services, Finom offers invoicing and a growing set of AI-supported accounting capabilities. The company is expanding across major European markets, including Germany, France, the Netherlands, Italy, and Spain.

Finom aims to give entrepreneurs a better operating experience while also creating meaningful impact for employees. The team values practical innovation, fast execution, strong research, and solutions that help users, partners, employees, and the business as a whole.

Team and mission

You will join the AI team, which owns the company’s AI products and underlying technology. This group builds AI features across the organization, including an AI financial co-pilot, a voice agent, and internal AI-enabled workflows.

The core idea behind the role is that AI systems are only as strong as the evaluation process behind them. Your responsibility will be to own that evaluation loop across AI products, covering quality checks before launch, monitoring after launch, and continuous improvement over time.

You will collaborate closely with the AI Quality lead, Igor Kolodkin, and work with AI engineers, Product, and domain specialists across the business. The main tools in use include Databricks, DeepEval, and Claude Code.

What you will do

Take ownership of, and further develop, the offline evaluation framework across products, including dataset design for capability and regression testing, judge setup, and metric definition.
Create and maintain dashboards for live quality tracking, such as resolution rate, CSAT, thumbs up/down, LLM-as-judge outputs, error rate, and latency.
Turn production feedback into action by identifying failure patterns in real user traffic, converting them into regression scenarios, and suggesting improvements to Product and subject-matter experts.
Improve the reliability of the evaluation approach by addressing judge consistency, instability, and non-deterministic behavior.
Convert analysis into practical decisions through weekly discussions, clear prioritization, and trade-off recommendations rather than dashboards that exist only for reporting.

Must-have experience

Strong working knowledge of Python and SQL, with the ability to take an analysis from start to finish independently.
A solid grounding in statistics, including sampling, hypothesis testing, variance, and interpretation of noisy metrics.
An analytical approach that begins with the business problem rather than the tool or method.
At least 3 years of experience in analyst or data scientist roles, with at least 1 year in a product-focused environment.

Preferred experience

Background in quality analytics for machine learning systems such as ranking, recommendations, or classification.
Practical experience assessing LLM-based applications, including RAG, agents, tool use, and judges.
Experience creating LLM agents through side projects, experiments, or personal builds.

How the team works

AI-assisted development is the standard way of working, not an optional extra.
Claude Code is the primary tool for SQL, Python, analysis, dashboards, and internal scripts.
The company is looking for analysts who are already comfortable exploring AI-powered coding, or who are eager to build that fluency quickly.
What matters most is the quality of what you ship and the clarity of your thinking.
If this approach energizes you rather than concerns you, you are likely to fit in well.

What you will get

The chance to influence the product in a meaningful way.
Opportunities to grow alongside a company on an upward trajectory, with resources for continued professional and personal development.
Flexibility to work remotely or in a hybrid setup from anywhere within Europe.
Participation in the stock options program, available to all team members from junior employees to founders.
A supportive, modern, friendly, and eco-conscious workplace focused on well-being and success.
Access to the Work & Swim program, which includes one month in a comfortable corporate apartment in Cyprus.

Equal opportunity and hiring note

Finom is an equal opportunity employer and welcomes applications from people of all backgrounds. The company does not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, disability, or other protected characteristics.

The hiring process may use AI tools to assist with tasks such as application review, resume analysis, and response assessment, including checking for inconsistencies or verification signals. These tools support recruiters but do not replace human judgment, and final hiring decisions are made by people. Candidates can contact the company for more information about how their data is processed.

Product Data Scientist — AI Evaluation & Quality

Wo Sie arbeiten werden