
Senior Data Engineer (Remote)
- Cape Town, Western Cape
- Permanent
- Full-time
- ETL/ELT Pipeline Development:
- Design, develop, and optimize efficient and scalable ETL/ELT pipelines using Python, PySpark, and Apache Airflow.
- Implement batch and real-time data processing solutions using Apache Spark.
- Ensure data quality, governance, and security throughout the data lifecycle.
- Cloud Data Engineering:
- Manage and optimize cloud infrastructure (Azure) for data processing workloads, with a focus on cost-effectiveness.
- Implement and maintain CI/CD pipelines for data workflows to ensure smooth and reliable deployments.
- Big Data & Analytics:
- Develop and optimize large-scale data processing pipelines using Apache Spark and PySpark.
- Implement data partitioning, caching, and performance tuning techniques to enhance Spark-based workloads.
- Work with diverse data formats (structured and unstructured) to support advanced analytics and machine learning initiatives.
- Workflow Orchestration (Airflow):
- Design and maintain DAGs (Directed Acyclic Graphs) in Apache Airflow to automate complex data workflows.
- Monitor, troubleshoot, and optimize job execution and dependencies within Airflow.
- Team Leadership & Collaboration:
- Provide technical guidance and mentorship to a team of data engineers in India.
- Foster a collaborative environment and promote best practices for coding standards, version control, and documentation.
- Client facing role so strong communication and collaboration skills are vital
- Proven experience in data engineering, with hands-on expertise in Azure Data Services, PySpark, Apache Spark, and Apache Airflow.
- Strong programming skills in Python and SQL, with the ability to write efficient and maintainable code.
- Deep understanding of Spark internals, including RDDs, DataFrames, DAG execution, partitioning, and performance optimization techniques.
- Experience with designing and managing Airflow DAGs, scheduling, and dependency management.
- Knowledge of CI/CD pipelines, containerization technologies (Docker, Kubernetes), and DevOps principles applied to data workflows.
- Excellent problem-solving skills and a proven ability to optimize large-scale data processing tasks.
- Prior experience in leading teams and working in Agile/Scrum development environments.
- A track record of working effectively global remote teams
- Experience with data modelling and data warehousing concepts.
- Familiarity with data visualization tools and techniques.
- Knowledge of machine learning algorithms and frameworks.
- Stay Curious: Being hungry to learn and grow, always asking the big questions.
- Seek Clarity: Embracing complexity to create clarity and inspire action.
- Own the Outcome: Being accountable for decisions and taking ownership of our choices.
- Center on the Client: Relentlessly adding value for our customers.
- Be a Challenger: Never complacent, always striving for continuous improvement.
- Champion Inclusivity: Fostering trust in relationships engaging with empathy, respect, and integrity.
- Commit to each other: Contributing to making Circana a great place to work for everyone.