Senior Data Engineer (Python)

Senior Data Engineer (Python)

Berlin Vollzeit 43200 - 84000 € / Jahr (geschätzt) Kein Homeoffice möglich
D

Auf einen Blick

  • Aufgaben: Design and implement features for our open-source data library, dlt.
  • Unternehmen: Join a dynamic team in Berlin and NYC, backed by top investors and tech veterans.
  • Vorteile: Enjoy flexible work options, learning budgets, and subsidized team lunches.
  • Weitere Informationen: Office-first culture with dedicated 'no meeting days' for focused work.
  • Warum dieser Job: Be part of an innovative project that transforms data processing with Python.
  • Qualifikationen: Fluent in Python, experienced with data libraries, and familiar with GitHub workflows.

Das prognostizierte Gehalt liegt zwischen 43200 - 84000 € pro Jahr.

Who We Are

We are looking for a Software or Data Engineer that is experienced in high performance Python data processing libraries (often referred to as the Composable Data Stack ). You will collaborate directly with our CTO and be part of the core product team.

dlt is an open-source library that automatically creates datasets from messy, unstructured data sources. You can use the library to move data from about anywhere into the most well-known SQL and vector stores, data lakes, storage buckets, or local engines such as DuckDB, Arrow or delta-rs. The library automates many cumbersome data engineering tasks and can be handled by anyone who knows Python. You can see more details in this Hacker News article.

dltHub is based in Berlin and New York City. It was founded by data and machine learning veterans. We are backed by Foundation Capital , Dig Ventures , and many technical founders from companies such as Datadog , Instana , Hugging Face , MotherDuck , Mesosphere , Matillion , Miro , and Rasa .

Your Tasks and Responsibilities:

dlt is a missing part between traditional Modern Data Stack and the emerging Pythonic Composable Data Stack: a gateway that creates datasets which the other components can then process. Our mission is to integrate dlt fully with this new, emerging ecosystem in a way that our users love. This means we respect their time, effort and previous investments in modern data stack when designing features. To support this mission your tasks and responsibilities include:

  • You design and implement OSS features that make dlt a gateway to composable data stack: integrating query engines, transformation frameworks, table formats with our library
  • You listen to our users, always paying attention to what they need to go to production with dlt.
  • You work with our customers in commercial projects where dlt is combined with existing “modern data stack” infrastructure
  • You maintain the open source project with the team (e.g., review PRs, resolve issues, talk with community contributors, etc.)

Who You Are

If you are fascinated by the emerging ecosystem of data libraries in Python, which allows you to do on a single machine what until recently was possibly only in the cloud - then you’ll enjoy working with us.

  • You know what duckdb, arrow, datafusion, lancedb, delta-rs, ibis, pyiceberg, sqlglot, kedro, hamilton and similar Python libraries / pip installable components do and know when to apply them.
  • You have experience in building data apps or product based on composable data stack
  • … or you were contributing code to any (or similar) of projects above
  • You know what so called “Modern Data Stack” is and appreciate certain aspects of it (ie. maturity, fitting into enterprise workflows etc.)
  • … and in fact you are interested in combining both worlds.
  • You really like Python and are fluent in writing Python code (e.g., Python typing, unit testing, writing docstrings, etc.)
  • You have a degree in computer science, data science, or other equivalent experience
  • You are familiar with GitHub workflows (e.g., pull requests, code reviews, CI/CD services, etc.)

Nice to Have:

  • You are based in Berlin and willing to work in our office regularly
  • You have a hacker nature and you love to make things optimized
  • Experience with DevOps (e.g., CI systems like GitHub Actions, Docker, Kubernetes, AWS/GCP/Digital Ocean, etc.)
  • Experience with machine learning (e.g., the toolset, the workflows, practical applications, etc.)

What do we offer

In our work culture, we value each other’s autonomy and efficiency. We have set hours for communication and deep work. We like automation, so we automate our work before we automate the work of others.

  • We are an office-first company but give you plenty of opportunities for deep work and work from home. Dedicated "no meeting days" to help the team focus on their most impactful work
  • As we work often from the Berlin office, we cover your public transportation ticket
  • We are deeply committed to your personal and professional growth, so we have an annual budget for learning and development.
  • We offer regular subsidized team lunches and Urban Sports Club membership.
  • We also have an ESOP plan for employees, depending on their role and dedication. We provide an option to increase your ESOP if you grow with us.
#J-18808-Ljbffr

Senior Data Engineer (Python) Arbeitgeber: dltHub

At dlt, we pride ourselves on fostering a collaborative and innovative work environment where your contributions directly impact the evolution of our open-source data processing library. Located in the vibrant city of Berlin, we offer a unique blend of autonomy and support, with dedicated 'no meeting days' to enhance productivity, an annual budget for personal development, and regular team lunches to strengthen camaraderie. Join us to be part of a forward-thinking team that values your growth and creativity while working at the forefront of the emerging Pythonic Composable Data Stack.

D

Kontaktdaten:

dltHub Recruiting-Team

StudySmarter Expertenrat🤫

Wir sind der Meinung, dass Sie so Senior Data Engineer (Python) erhalten könnten

Tip Number 1

Familiarize yourself with the specific Python libraries mentioned in the job description, such as DuckDB, Arrow, and Delta-rs. Being able to discuss these tools and their applications during your interview will show that you have a strong understanding of the composable data stack.

Tip Number 2

Engage with the open-source community around dlt and similar projects. Contributing to discussions or even small code contributions can demonstrate your passion and commitment to the field, making you a more attractive candidate.

Tip Number 3

Prepare to discuss your experience with GitHub workflows, especially pull requests and code reviews. Highlighting your familiarity with these processes will reassure us that you can seamlessly integrate into our team.

Tip Number 4

Showcase any previous projects where you've built data applications or worked with modern data stacks. Real-world examples of your work will help us see how you can contribute to our mission at dlt.

Wir glauben, dass du diese Fähigkeiten brauchst, um Senior Data Engineer (Python) mit Bravour zu bestehen

High Performance Python Data Processing
Composable Data Stack
Open Source Software Development
Data Integration
SQL and Vector Stores
Data Lakes
DuckDB

Einige Tipps für deine Bewerbung 🫡

Understand the Role:Make sure you fully understand the responsibilities and requirements of the Senior Data Engineer position. Highlight your experience with Python data processing libraries and your familiarity with the Composable Data Stack in your application.

Tailor Your CV:Customize your CV to emphasize relevant skills and experiences that align with the job description. Include specific projects or contributions you've made to open-source data libraries, especially those mentioned in the job listing.

Craft a Compelling Cover Letter:Write a cover letter that showcases your passion for data engineering and your understanding of the modern data stack. Mention how your background and interests align with the company's mission and values.

Showcase Your Technical Skills:In your application, provide examples of your proficiency in Python, GitHub workflows, and any relevant DevOps experience. Consider including links to your GitHub profile or any projects that demonstrate your capabilities.

Wie man sich auf ein Vorstellungsgespräch bei dltHub vorbereitet

Show Your Python Expertise

Make sure to highlight your experience with high-performance Python data processing libraries. Be prepared to discuss specific projects where you've utilized libraries like DuckDB, Arrow, or delta-rs, and how they contributed to the success of your data engineering tasks.

Understand the Composable Data Stack

Demonstrate your knowledge of the composable data stack and its components. Be ready to explain how you would integrate various query engines and transformation frameworks with dlt, showcasing your understanding of both traditional and modern data stacks.

Engage with User Needs

Since the role involves listening to users, prepare examples of how you've gathered user feedback in past projects. Discuss how you translated that feedback into actionable features or improvements, emphasizing your commitment to user-centric design.

Familiarize Yourself with Open Source Contributions

As maintaining the open-source project is part of the job, be ready to talk about your experience with GitHub workflows. Share any contributions you've made to open-source projects, including code reviews and issue resolutions, to demonstrate your collaborative spirit.