Site Reliability Engineer

Experian

  • Sandton, Johannesburg
  • Permanent
  • Full-time
  • 12 days ago
  • Apply easily
Company DescriptionExperian unlocks the power of data to create opportunities for consumers, businesses and society. At life’s big moments – from buying a home or car, to sending a child to college, to growing a business exponentially by connecting it with new customers – we empower consumers and our clients to manage their data with confidence so they can maximize every opportunity.We gather, analyze and process data in ways others can’t. We help individuals take financial control and access financial services, businesses make smarter decision and thrive, lenders lend more responsibly, and organizations prevent identity fraud and crime. For more than 125 years, we’ve helped consumers and clients prosper, and economies and communities flourish – and we’re not done.Our 22 000 people in 32 countries believe the possibilities for you, and our world, are growing. We’re investing in new technologies, talented people and innovation so we can help create a better tomorrow.Job DescriptionWhy this role is critical to us
  • As part of the next phase in our growth, we are looking to expand our Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering model. We have established a platform using cutting edge technology, such as Kubernetes, containers, pipelines and monitoring. The candidate will be a forward-looking engineer with an understanding of how SRE will enable operations in the future. You will have broad operations and automation interests and not shy away from the operational aspects of life and understand that the best way to build reliability is to break things often.
  • The ideal candidate will have experience of operations, a passion for automation and an interest in software development or they will have experience of software development, a passion for automation and an interest in operational excellence. If you have incident manager skills and are able to manage rationally and calmly during a crisis that would be an added bonus. There is an expectation to work occasional peak weekends as well as some on call requirements. This is the beginning of a growing team and we are looking for individuals to grow with it.
  • You will lead the team’s technical vision bridging the gap across platforms, infrastructure, automation and software.
  • You will be able to review and design non functional requirements, prioritise key areas of operational architecture and guide both operational staff and software feature engineers on SRE best practice.
What you’ll need to bring to the party
  • Excellent communication skills-written and verbal
  • Highly organised and with good attention to detail
  • Customer orientated
  • Working across boundaries - geographically, teams, language and cultural
  • Curious and willing and able to learn new technologies and practices
  • Cloud aware, you understand how cloud technologies differ from other technical approaches and are able to explain these to others.
  • Lives and breathes availability and operational excellence in technology
What you’ll be doing
  • Uptime of Experian Platforms Software: ExperianOne – Experian’s Cloud SaaS offering for Decision Analytics and Fraud specific platforms.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Partner with development teams or equivalent to improve services through rigorous testing and release procedures
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Think about systems - edge cases, failure modes, behaviours, specific implementations
  • Make monitoring and alerting alert on symptoms and not on outages
  • Responding to incidents and restoring service
  • Over time, gaining a good enough understanding of the systems to efficiently triage issues and find owners for problem resolution
  • An ability to identify an issue or a manual process and ensure that they never occur again: solving, improving, documenting
  • Incident management; able to co-ordinate others and be co-ordinated during service disruptions with a focus on restoring availability
  • Ability to write complex queries using various tools
  • Ability to identify high level root cause from symptoms, e.g. Networks, Application, Compute, Storage.
  • Understanding of Kubernetes, Infrastructure as Code, High availability principles.
  • Excellent communication skills in English with colleagues across the globe.
  • Strong relationships with other members of the SRE team in EMEA & APAC and also with Global SRE team around the globe
  • Working relationships with colleagues in other departments, third parties who support backing applications.
  • Collaborative relationships with developers, security and architects to influence them to build resilient, maintainable solutions
  • Proficiency in one programming or scripting language and willingness to apply software development best practices to an operational role
Qualifications
  • Matric
  • IT related qualification
  • More than 5 Years’ experience in supporting complex, highly scaled systems in production
  • Linux knowledge, experience troubleshooting and predicting issues in advance
  • Networking, troubleshooting and monitoring
  • Cloud Native application designs for high performance, scalability and resilience
  • Incident Management and co-ordination, Blameless PIRs
  • Experience in-Kubernetes, OpenShift, EKS, Splunk, Dynatrace, Thousand Eyes, ServiceNow, Jira, Jenkins, Python
  • Experience in- Java, Cassandra, Redis, RunDeck, MongoDB, Apigee, Okta, PostGres, AWS, Azure, GCP
  • Infrastructure as Code, Git Ops.
Additional InformationExperian Careers - Creating a better tomorrow together.Experian Careers - Creating a better tomorrow together

Experian

Similar Jobs

  • Site Reliability Engineer

    Experian

    • Sandton, Johannesburg
    Company Description Experian unlocks the power of data to create opportunities for consumers, businesses and society. At life's big moments - from buying a home or car, to sendin…
    • 12 days ago
  • Platform / DevOps / Site Reliability Engineer

    Elite Search

    • Johannesburg, Gauteng
    Requirements: 3 - 5yrs + DevOps / Site Reliability / Platform Engineer  or System Administration experience in software environment. Experience working with IaaS and public …
    • 19 days ago
  • Site Reliability Engineer (SRE)

    Sabenza IT

    • Midrand, Johannesburg
    Site Reliability Engineer (SRE) DevOps/Platform Engineer - IT Menlyn - Gauteng - South Africa, Midrand - Gauteng - South Africa, Rosslyn - Gauteng - South Africa Are you a Site …
    • 1 month ago