SVP, Site Reliability Engineering Domain Lead, SRE & Governance, Group Technology

SVP, Site Reliability Engineering Domain Lead, SRE & Governance, Group Technology

St. Gallen Vollzeit Kein Homeoffice möglich
United States Digital Space LLC
ph3Roles Responsibilities /h3ulliManage a large team of Production Support Personnel across multiple bgeographical locations covering Applications and Infrastructure /b /liliEnsure SLAs on Alerts and Incidents (Application Infra) are proactively managed and reduce Mean Time To Recover (MTTR) by 20% /liliEnsure strict adherence to Standard Operating Procedures for recovery across bApplication and Infrastructure layers /b /liliDeliver a playbook for onboarding new tasks / activities covering bboth Application and Infrastructure support models /b /liliIdentify opportunities to automate Production support activities (App Infra) and reduce manual interventions /liliDrive application and binfrastructure improvements /b including performance, capacity, resilience, and operational stability; eliminate toil through automation /liliAutomate manual activities/processes and system health checks for bProduction Applications and Infrastructure /b ; ensure SLIs/SLOs are defined and met /liliFollow Production Support Processes and provide inputs to continuously strengthen them for bApp + Infra operations /b /liliProvide status to leads, stakeholders and work with vendors to review bInfra/Application design, fixes, and production deployments /b /liliCoordinate recurring issues and ensure long-term resolution through robust Incident and Problem Management across bInfra and Application domains /b /liliWork with Infrastructure, Development, and Platform teams for root cause analysis of complex issues and outages /liliDrive strong stakeholder management with focus on service stability, continuous improvement, and delivery excellence across bInfra and Applications /b /liliLead Root Cause Analysis with technology partners and facilitate RCA reviews post incident resolution /liliWork with Risk teams to respond to Audit Risk RFIs; manage audit walkthroughs covering bInfrastructure and Application controls /b /li /ulh3Requirements /h3ulli10–15 years of experience in Banking with minimum 5+ years in a Run-the-Bank (RTB) Lead role covering bApplication and Infrastructure Support /b /liliStrong implementation of Site Reliability Engineering (SRE) principles across bApplications and Infrastructure /b including performance, reliability, monitoring, alerting, and maintenance /liliProactive capacity monitoring and observability of bProduction Infrastructure (compute, storage, network, platform, MF and DB) /b with automated alerting and reporting /liliProven experience in bautomation of Infra Application support tasks /b and reducing manual toil /liliBuild and maintain monitoring and automation solutions for bInfrastructure and Application stacks /b /liliDrive service improvements by tracking SLIs/SLOs/SLAs and improving system and binfrastructure performance KPIs /b /liliStrong technical understanding across bRDBMS, Unix/Linux, Cloud platforms, and Infrastructure components (servers, network, middleware, containers) /b /liliHands‑on knowledge of infrastructure technologies, especially bLinux, Database, OpenShift (or container platforms) /b /liliSolid understanding of BAU support, Incident/Problem Management, and escalation management across bdistributed Infra-App environments /b /liliGood understanding of bInfrastructure architecture, capacity planning, DR/BCP, IT security, and regulatory compliance /b /liliStrong collaborator with experience working across global teams and vendors /liliAbility to present recommendations effectively in both written and verbal formats /liliProactive, independent, resourceful, and team-oriented mindset /li /ulpLocation: DBS Asia Hub, Job: Technology, Schedule: Regular, Employee Status: Full time /p /p #J-18808-Ljbffr
United States Digital Space LLC

Kontaktdaten:

United States Digital Space LLC Recruiting-Team