Site Reliability Engineer (Datadog)

Datacentrix

  • Johannesburg, Gauteng
  • Permanent
  • Full-time
  • 16 days ago
Qualifications and Experience:
  • Datadog Certified Fundamentals – Must have
  • Degree in Information Technology or Computer Science
  • Management of operations on virtualized and distributed infrastructures,
  • Management of operations on environment with clustering, replication, load balancer
  • ITIL Practitioner (V3) / ITIL Specialist (V4)
  • Windows Server: Advantage
  • 1–3 years of experience working with a modern monitoring/observability tool, ideally Datadog (or alternatives like Prometheus, Grafana, New Relic, or Dynatrace).
  • Experience in:
  • Deploying and configuring monitoring agents
  • Creating dashboards and monitors
  • Parameterizing tags and labels for proper data correlation
  • Basic familiarity with cloud platforms (AWS, Azure, or GCP) and container environments (Docker/Kubernetes)
  • Experience working with Centreon - Advantage
  • Strong interest in monitoring, DevOps, SRE, or cloud infrastructure
  • Knowledge of basic scripting (e.g., Bash, Python) is a plus
Duties:
  • Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.
  • Work alongside DevOps, infrastructure, and application teams to ensure complete observability using custom dashboards, alerts, and tagging strategies.
  • Assist in the deployment and onboarding of new systems into the monitoring ecosystem.
  • Serve as the go-to person for building visualizations, improving signal-to-noise ratios in alerting, and aligning monitoring with business objectives.
  • Ideal for a young and motivated engineer looking to grow within observability and cloud-native monitoring.
  • Deploy and configure Datadog agents across various environments (cloud and on-prem).
  • Create and customize dashboards, monitors, and alerts for systems, services, containers, and applications.
  • Implement tagging strategies to organize, filter, and correlate metrics and logs effectively.
  • Integrate Datadog with various platforms (AWS, Azure, GCP, Kubernetes, Docker, etc.) to collect telemetry data.
  • Collaborate with developers, DevOps, and infrastructure teams to identify key business and system metrics to monitor.
  • Continuously tune and optimize monitors to reduce false positives and improve actionable alerting.
  • Document dashboards, alert logic, best practices, and knowledge for cross-team enablement.
  • Analyze incidents and outages post-mortem to identify monitoring gaps and enhance visibility.
  • Assist in evangelizing observability practices within the organization and contribute to monitoring as code efforts (e.g., Terraform for Datadog resources).
  • Stay up to date with new Datadog features and industry trends in observability and monitoring.

ExecutivePlacements.com

Similar Jobs

  • Site Manager

    Cre8work

    • Germiston, Gauteng
    Job Responsibilities General Management of Customer - Fleet (R/Wheels) Customer driven. To establish and maintain SLA to clients Systems and Procedures set out. Admin F…
    • Just now
  • Big Data Data Engineer

    PBT Group

    • Johannesburg, Gauteng
    We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, struct…
    • Just now
  • Data Engineer

    PBT Group

    • Johannesburg, Gauteng
    ? Data Engineer - Azure Data Factory & Databricks Join PBT Group and help us build the future of data-driven innovation PBT Group is looking for an experienced Data Engineer wit…
    • Just now