Site Reliability Engineer (m/f/d)
Permanent employee, Full-time · Remote within Germany
Read job description in:
Your mission
- Design, build, and maintain our infrastructure and tools to allow for the highly reliable and scalable deployment of services and applications, incorporating both cloud-based and on-premise solutions
- Implement comprehensive monitoring and observability frameworks to detect and resolve issues proactively, using tools like Prometheus, Grafana, and Zabbix for system health and performance metrics
- Develop and manage incident response protocols, including on-call rotations, incident analysis, and conducting postmortems to ensure continuous improvement in system reliability and performance
- Automate infrastructure and workflows using Infrastructure as Code (IaC) tools like Ansible
- Optimize system performance through regular performance tuning, capacity planning, and conducting reliability experiments to identify and mitigate potential points of failure
- Collaborate with development teams to advocate for reliability and scalable practices throughout the software development life cycle, and assist in the design and review of new systems and major changes
Your profile
- 5+ years of experience in IT with a focus on system administration and automation
- Expertise in Linux system administration and in using Infrastructure-as-Code tools like Ansible
- Strong knowledge of scripting and programming in Bash and Python
- Experience with containerization technologies (Docker) and orchestration tools (e.g., Docker Swarm or Kubernetes)
- Experience of running demanding Java applications in production with an understanding of the JVM and Java memory management
- Work experience in the data center, such as cabling, server racking, up to and including data center design
- Strong analytical and problem-solving skills with experience in troubleshooting complex issues triggered and supported by monitoring tools
- Effective communication and collaboration abilities, essential for working across teams and with stakeholders
- Fluent in English and German
THE JOY OF WORKING WITH US
- Scale-up company with a market-leading product
- Open culture with diverse international teams
- Flexible working hours
- State-of-the-art equipment
- Personal development support, e.g. access to the learning platform Udemy
- Regular feedback rounds
Job Location
Kontaktperson:
FACT GmbH HR Team