Site Reliability Engineer
Datafin
- Gauteng
- Permanent
- Full-time
- Designing, implementing, and maintaining CI/CD pipelines for Kubernetes-based applications.
- Automating deployment processes and ensuring continuous integration and delivery of software.
- Implementing monitoring solutions for infrastructure and applications using tools such as Prometheus, Grafana, and Kubernetes-native monitoring.
- Generating reports on system performance, availability, and reliability.
- Analysing logs and metrics to identify trends, anomalies, and performance issues.
- Implementing log aggregation and analysis solutions like ELK Stack or Splunk.
- Investigating and resolving issues related to application performance, availability, and reliability in Kubernetes environments.
- Collaborating with development teams to diagnose and debug complex issues.
- Setting up alerting mechanisms to proactively detect and respond to incidents.
- Escalating critical issues to appropriate teams and stakeholders.
- Managing and maintaining Linux servers, including installation, configuration, and patch management.
- Implementing security measures and best practices for Linux-based systems.
- Managing user accounts, groups, and permissions in Active Directory.
- Performing routine maintenance tasks and ensuring the security of AD infrastructure.
- Configuring and managing DNS servers and zones.
- Troubleshooting DNS-related issues and ensuring DNS resolution reliability.
- Providing technical support and assistance to end-users for infrastructure-related issues.
- Resolving hardware, software, and connectivity problems promptly.
- Managing PostgreSQL databases, including installation, configuration, and performance tuning.
- Performing routine maintenance tasks such as backups, restores, and upgrades.
- 3+ years of experience in a Site Reliability Engineer role or similar position.
- Proficiency in Kubernetes administration and experience with CI/CD pipelines.
- Strong Linux administration skills, including shell scripting and troubleshooting.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Splunk.
- Familiarity with Active Directory administration and DNS management.
- Experience with PostgreSQL database administration is a plus.
- Excellent communication and problem-solving skills.
- Ability to work effectively in a fast-paced, collaborative environment.
Careers24