Research Engineer, Environment Scaling

Research Engineer, Environment Scaling

Vollzeit Kein Homeoffice möglich
A

Requirements

  • Have experience with fine-tuning large language models for specific domains or real-world use cases and/or domain expertise in an area where we would like to make our models more useful
  • Have experience with reinforcement learning, reward design, or training data curation for LLMs
  • Are comfortable managing technical vendor relationships and iterating quickly on feedback
  • Find value in reading through datasets to understand them and spot issues
  • Have strong project management and interpersonal skills
  • Are passionate about making AI more useful and accessible across different industries
  • Are excited about a role that includes a combination of ML research, data operations, and project management
  • (Desirable) Have experience training production ML systems
  • (Desirable) Be familiar with distributed systems and cloud infrastructure
  • (Desirable) Have domain expertise in an area where we would like to make our models more useful
  • (Desirable) Have experience working with external vendors or technical partners
  • Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience
  • Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices

What the job involves

  • The Environment Scaling team is a team of researchers and engineers whose goal is to improve the intelligence of our public models for novel verticals and use cases
  • The team builds the training environments that fuel RL at scale
  • This is a unique role that combines executing directly on ML research, data operations, and project management to improve our models
  • You'll own the end-to-end process of creating RL environments for new capabilities: identifying high-value tasks, designing reward signals, managing vendor relationships, and measuring impact on model performance
  • Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
  • Manage technical relationships with external data vendors, including evaluation of data quality and reward design
  • Collaborate with domain experts to design data pipelines and evaluations
  • Explore novel ways of creating RL environments for high value tasks
  • Develop and improve QA frameworks to catch reward hacking and ensure environment quality
  • Partner with other RL research teams and product teams to translate capability goals into training environments and evals

#J-18808-Ljbffr
A

Kontaktdaten:

Anthropic Recruiting-Team