Senior Software Engineer - Observability Visibility

Senior Software Engineer - Observability Visibility

Vollzeit Kein Homeoffice möglich
Datadog

The Observability Visibility SRE Team is part of the Observability and Resilience Enablement group within the SRE/Security organization. Observability and Resilience Enablement focuses on closing the loop between how Datadog engineers detect and respond to issues and incidents and how those learnings translate into measurable risk reduction and lower customer impact. The Observability Visibility team carries the organization's 100% visibility priority, defining observability and reliability baselines and ensuring services consistently meet them by default through scalable, automated, and sustainable solutions.


Sind Sie der/die richtige Kandidat/in für diese Gelegenheit Lesen Sie unbedingt die vollständige Beschreibung unten.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work‑life harmony that best fits them.

What You'll Do
  • Define and evolve observability and resilience baselines, ensuring alignment with measurable risk reduction goals across Datadog services.
  • Measure service compliance against established standards, assess risk and remediation complexity and drive sustainable solutions to close identified gaps.
  • Design and deliver scalable observability and reliability capabilities across the software development lifecycle, leveraging automation and AI‑driven solutions where appropriate to enable service owners to meet established standards by default while partnering closely with platform, SRE, product and engineering teams to ensure adoption and sustained coverage.
  • Provide technical leadership and day‑to‑day coaching to team members, accelerating their growth through design reviews, collaborative problem‑solving and operational excellence best practices.
Who You Are
  • You have 5+ years of experience in software engineering, site reliability engineering, or a related discipline supporting production systems at scale.
  • You have hands‑on experience with observability and resilience practices, including expertise in identifying, analyzing, and mitigating service and system failure modes.
  • You have strong programming skills in Go and/or Python and can design and build reliable, maintainable systems.
  • You are comfortable navigating complex technical challenges and proposing efficient, scalable, and easy‑to‑adopt solutions.
  • You have experience delivering AI‑enabled software features end‑to‑end, including design, evaluation, deployment and monitoring and can articulate when AI is the appropriate solution and when it is not.
  • You have strong communication, collaboration, and mentorship skills with experience influencing technical direction across multiple engineering teams. xayajpt
Benefits and Growth
  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in‑house networking
  • An inclusive company culture and opportunities to participate in Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks and internal learning opportunities
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits
  • Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Datadog

Kontaktdaten:

Datadog Recruiting-Team