About us
We’re Cortea, a Berlin startup transforming audits with AI. Manual, document-heavy audits waste expert time; we automate the repetitive parts so experts focus on judgment. Backed by top-tier VCs with €10m+ funding, live product, paying customers. We value first-principles thinking, speed, trust, and kindness. We build side by side in our Berlin office.
Your Role
We are looking for an Engineer with strong data engineering and AI systems experience to build the data, evaluation, and observability foundation for production-grade LLM agents used in complex audit workflows.
This role sits at the intersection of backend engineering, data engineering, AI infrastructure, and LLM operations. You will work hands-on in our backend and agent architecture, building the systems that help us evaluate, monitor, debug, optimize, and continuously improve AI agents in production.
This is not a traditional analytics, BI, or dashboarding role. You should expect to write production code, design infrastructure, work inside backend systems, and directly improve the quality, cost, reliability, and performance of LLM-based agents.
What you’ll do
You will help building and operating the technical infrastructure around our AI agents, with a focus on data infrastructure, evaluation, observability, and optimization. Your work will include:
Building online and offline evaluation systems for LLM agents, including pipelines that use golden datasets, ground-truth data, human review workflows, and experiment results.
Creating automated quality gates so changes to prompts, context, models, or agent logic can be tested before reaching production.
Analyzing large volumes of agent traces and executions to identify failure modes, quality regressions, latency issues, reliability gaps, and cost optimization opportunities.
Working with columnar data stores and analytical databases such as BigQuery, ClickHouse, or similar technologies.
Building reliable data retention and replay mechanisms for long-term analysis of production agent behaviour.
Creating observability tooling for trace analysis, experiment monitoring, production dashboards, logging, tracing, and debugging.
Working inside our core backend and agent architecture, including building new agents or improving existing agents when needed.
Qualifications
You will fit into this role if you:
Have strong Python and/or backend engineering experience.
Have strong SQL skills and are comfortable working with large datasets.
Have deployed and operated systems in the cloud, ideally on GCP.
Have practical experience designing data pipelines, ETL/ELT workflows, event-processing systems, or feedback loops for production data.
Are comfortable working with analytical databases, data warehouses, columnar stores, and high-volume event or trace data.
Understand system design, reliability, observability, monitoring, logging, debugging, and operational trade-offs.
Can work in complex existing systems and quickly build a mental model of how they operate.
Bring senior-level engineering judgment: you can make architectural decisions, communicate trade-offs, and build systems that other engineers can extend.
Are comfortable with ambiguity, able to reason from first principles, and excited to build infrastructure for AI systems that are actively used in production.
Nice-to-haves that are a plus:
Building infrastructure around LLM-based products or agentic systems, including optimizing LLM usage, context windows, reasoning tokens, or model selection.
Working with production traces from complex distributed systems.
Building internal platforms for engineers, domain experts, or operations teams.
Using workflow orchestration systems such as Temporal or similar.
Familiarity with audit, finance, compliance, or other high-accuracy domains.
Experience in an early-stage startup or fast-moving engineering environment.
No one checks every box. If you’ve shipped retrieval systems and like owning evaluations and pipelines, let’s talk.
What we offer
High impact & growth: Shape strategy at a scaling AI startup from day one
Mission-driven culture: Ambitious team valuing first-principles thinking and bold ideas
Attractive compensation: competitive salary plus significant equity
Personal development: Learning budget for courses and conferences
Startup perks: Flexible vacation, team lunches, retreats, central Berlin office
Interview process
First Call — Intro to Cortea with our Talent Partner Adriana
Second Call — Technical interview with Jendrik
Third Call — Deep dive into our culture with our Co-Founder Philipp
On-site Half-Day (Berlin) — Meet the team and work on a real problem together
We’re an equal-opportunity team and encourage women and underrepresented groups to apply.