Senior DevOps Engineer
Impact Tech
- Cape Town, Western Cape
- Permanent
- Full-time
- Passionate about its people
- Focused on delivering the very best tech to our customers
- Offering the flexibility to work how and where you are most successful
- Obsessed with our customer's success
- The leading SaaS platform to automate partnerships - affiliate, influencer, technology partners, and more!
- Entrepreneurial in spirit with a culture that rewards collaboration and curiosity
- Obsessed with making a difference in business and to the wider community
- Lead the evaluation and adoption of new technologies and tools within the DevOps squad, ensuring alignment with the organization's objectives and future needs
- Take a proactive approach to bug triage, including root cause analysis and fixing, to ensure the stability and reliability of the systems and applications supported by the DevOps team.
- Collaborate with cross-functional teams, including developers, QA engineers, product managers, and stakeholders, to define and achieve departmental and organizational objectives on a quarterly basis.
- Drive the automation of complex and critical manual tasks and optimize repetitive processes across the development, testing, deployment, and monitoring phases of the software development lifecycle.
- Provide mentorship and technical leadership to junior team members, including participating in pair programming sessions, conducting code reviews, and sharing best practices.
- Lead research and prototyping efforts for emerging technologies, exploring opportunities for innovation and improvement in existing systems and processes.
- Take ownership of creating and maintaining comprehensive documentation, including design documents, user guides, and test plans, to ensure clear communication and knowledge transfer within the team and across departments.
- Drive the implementation of robust software testing and quality assurance processes, including the development and maintenance of automated tests at unit, functional, and integration
- Lead incident management efforts, including responding to alerts, reviewing error messages, and diagnosing and resolving technical issues in a timely manner to minimize impact on system availability and performance.
- Ensure the stability and scalability of the infrastructure and platform by maintaining build- and stage stability, optimizing resource utilization, and implementing infrastructure as code
- Lead efforts to ensure compliance with industry regulations (e.g., GDPR, HIPAA) and internal security policies, including implementing security best practices, conducting security audits, and managing access controls.
- Monitor system performance and usage trends, and conduct capacity planning to anticipate and accommodate future growth and scaling needs, ensuring the infrastructure can support increasing
- Implement cost optimization strategies for cloud infrastructure and services, including rightsizing resources, leveraging reserved instances, and implementing tagging and monitoring to track and optimize costs.
- Provide leadership and support for on-call rotations, including participating in incident response activities, conducting post-mortem analyses, and implementing corrective actions to prevent
- Evaluate and manage relationships with third-party vendors and service providers, including cloud providers, software vendors, and infrastructure partners, to ensure the organization receives value and meets its objectives.
- Work closely with business stakeholders to understand their needs, requirements, and technical specifications, translating them into actionable plans and solutions that deliver value to the company.
- Extensive experience with Linux operating systems (e.g., Ubuntu, CentOS, Red Hat), including installation, configuration, maintenance, and troubleshooting.
- Understanding of Linux networking concepts, including IP addressing, routing, firewalls (e.g., iptables), and network Knowledge of Linux security best practices, including user management, permissions, and encryption.
- Proficiency in performance tuning techniques for Linux systems, including optimizing kernel parameters, disk I/O tuning, memory management, and CPU utilization.
- A good understanding of IaC ( infrastructure of code ) principles and adopting these methods to drive automation and self service.
- A comprehensive grasp of coding and scripting in common languages, including Python, Perl, PHP, and Ruby.
- Familiarity with at least one primary coding language, like C++ or
- Familiarity with containerization technologies such as Docker and container orchestration platforms like Kubernetes, used for managing and scaling containerized applications on Linux- based systems.
- Understanding of high availability concepts and technologies such as Linux clustering, load balancing (e.g., HAProxy, Nginx, F5, Treafik), and failover mechanisms for ensuring system reliability and uptime.
- The ability to identify, evaluate, and integrate diverse open-source technologies and cloud services.
- Proven experience with business and CI/CD tools like Prometheus, GitHub, Atlassian Jira, Confluence, and Jenkins.
- Proven experience with public cloud resources and services, including AWS, Microsoft Azure, and Google Cloud.
- Familiarity with various IT monitoring and management tools like Cloudflare and
- Proficiency in troubleshooting and resolving technical issues across staging, uat and production
- A strong focus on security, adhere to NIST and CIS standards, ability to implement security hardening measures for Linux servers and environments, including patch management, vulnerability scanning, intrusion detection, and security compliance auditing.
- Integrity and ethical leadership, demonstrating honesty, transparency, and fairness in all interactions, and upholding ethical standards and values in decision-making and
- Ability to lead and mentor junior team members, providing guidance, support, and feedback to help them grow and develop their skills in DevOps practices and technologies.
- Excellent communication skills, both verbal and written, to effectively communicate technical concepts and ideas to non-technical stakeholders, facilitate discussions, and build consensus across teams.
- Strong problem-solving skills and the ability to make sound decisions under pressure, analyzing complex technical issues, evaluating options, and implementing effective solutions that drive
by PXABenefits/Perks:
- Unlimited PTO policy
- Take the time off that you need. We are truly committed to a positive work-life balance, recognising that it is important to be happy and fulfilled in both
- Training & Development
- Learning the advanced partnership automation products
- Medical Aid and Provident Fund
- Group schemes with Discovery & Bonitas for medical aid
- Group scheme with Momentum for provident fund
- Stock Options
- 3-year vesting schedule pending Board approval
- Internet Allowance
- Flexible work hours
- Casual work environment