Member of Engineering, Pre-training/data (Remote)
Member of Engineering, Pre-training/data (Remote)

Member of Engineering, Pre-training/data (Remote)

Vollzeit Kein Home Office möglich
Go Premium
P

About Poolside

We are software\’s leading AI research lab.

We are a frontier lab focused on building the most capable models and systems to support them. Our models are generally capable and are purpose-built specifically to excel at software engineering. Our proprietary approach and techniques allow our models to learn like the best developers through trial and error, navigating ambiguity to discover working solutions.

About the Role

You would be working on our data team focused on the quality of the datasets being delivered for training our models. This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments. This includes synthetic data generation and data mix optimization.

You would be closely collaborating with other teams like Pre-training, Fine-tuning and Product to define high-quality data both quantitatively and qualitatively.

Staying in sync with the latest research in the field of dataset design and pretraining is key for being successful in a role where you would be constantly showing original research initiatives with short time-bounded experiments and highly technical engineering competence while deploying your solutions in production. With the volumes of data to process being massive, you\’ll have at your disposal a performant distributed data pipeline together with a large GPU cluster.

Why you should join

  • Code the future of AI-powered development – Build the scalable platform that powers Poolside\’s fine-tuning efforts, directly impacting how our foundational models learn and improve

  • Series C funding imminent – Join a $500M+ Series B startup that\’s about to close an even larger round, with massive compute resources and runway for years

  • Elite engineering culture – 75% of the 120-person team is engineering, working alongside ex-GitHub CTO Jason Warner and top-tier talent from Snap, GitHub, and other leading companies

Your Mission

To deliver massive-scale datasets of natural language and source code with the highest quality for training poolside models.

Responsibilities

  • Follow the latest research related to LLMs and data quality in particular. Be familiar with the most relevant open-source datasets and models

  • Closely work with other teams such as Pre-training, Fine-tuning or Product to ensure short feedback loops on the quality of the models delivered

  • Suggest, conduct and analyze data ablations or training experiments that aim to improve the quality of the datasets generated via quantitative insights

#J-18808-Ljbffr

P

Kontaktperson:

Poolside HR Team

Member of Engineering, Pre-training/data (Remote)
Poolside
Premium gehen

Schneller zum Traumjob mit Premium

Deine Bewerbung wird als „Top Bewerbung“ bei unseren Partnern gekennzeichnet
Individuelles Feedback zu Lebenslauf und Anschreiben, einschließlich der Anpassung an spezifische Stellenanforderungen
Gehöre zu den ersten Bewerbern für neue Stellen mit unserem AI Bewerbungsassistenten
1:1 Unterstützung und Karriereberatung durch unsere Career Coaches
Premium gehen

Geld-zurück-Garantie, wenn du innerhalb von 6 Monaten keinen Job findest

>