
Data Engineer (CPT – Onsite) (3-month Contract to go perm)
- Cape Town, Western Cape
- Contract
- Full-time
- Design & Build Data Pipelines: Design, develop, and deploy scalable, production-grade data pipelines on Google Cloud Platform using services like Cloud Dataflow, Cloud Dataproc, and Cloud Functions to ingest, process, and transform large volumes of structured and unstructured data.
- Architect Data Warehouses & Lakes: Develop and maintain their core data warehouse architecture in BigQuery and their data lake on Google Cloud Storage, ensuring data integrity, security, performance, and cost-optimization.
- Data Modeling & ETL/ELT: Design and implement robust data models and schemas within BigQuery. Build and manage complex ETL/ELT processes using SQL, Dataflow, and other GCP-native tools to support analytics and reporting.
- Custom API Integration: Design, build, and maintain custom RESTful APIs using Python (e.g., Flask, FastAPI) and PHP to handle data ingress from and egress to various internal systems, bot platforms, and third-party services.
- Integration & Streaming: Develop and maintain data integration solutions to connect various internal and external data sources using Pub/Sub for real-time streaming and Cloud Data Fusion for batch integration.
- Monitoring & Optimization: Proactively monitor, troubleshoot, and optimize data pipelines and BigQuery performance. Utilize Cloud Monitoring and Cloud Logging to ensure reliability, identify bottlenecks, and resolve issues.
- Data Governance: Implement and manage data quality and governance frameworks within the GCP ecosystem, leveraging tools like Dataplex for data discovery, metadata management, and policy enforcement.
- Collaboration: Work closely with data scientists, analysts, and software engineers to understand data requirements, translate them into technical specifications, and integrate data solutions into production applications.
- Innovation & Best Practices: Stay current with the latest GCP data services and industry best practices. Champion and implement innovative solutions to continuously improve their data platform.
- Documentation: Create and maintain comprehensive documentation for data pipelines, architectures, and processes to ensure knowledge sharing and long-term maintainability.
- Proven Experience: Demonstrated experience as a Data Engineer, with a strong portfolio of designing and building data solutions specifically on Google Cloud Platform.
- Programming Proficiency: Strong proficiency in backend programming with Python and PHP for data processing, API development, and infrastructure automation. Ability to write clean, efficient, and maintainable code.
- SQL Mastery: Expert-level SQL skills, with extensive experience writing complex queries, optimizing performance, and modeling data in BigQuery.
- Core GCP Services: Hands-on, in-depth experience with core GCP data services, including BigQuery, Cloud Storage, Cloud Dataflow, and Pub/Sub.
- Data Architecture: Solid understanding of modern data warehousing concepts (e.g., Kimball, Inmon), data lake architectures, distributed systems, and data governance best practices within a cloud context.
- Data Integration: Proficiency in various data integration techniques, including batch processing, real-time streaming, and API-based data ingestion on GCP.
- Orchestration: Experience with workflow orchestration tools, preferably Cloud Composer (Managed Apache Airflow).
- Infrastructure as Code (IaC): Familiarity with managing GCP resources using IaC tools like Terraform.
- Database Knowledge: Experience with relational databases like Cloud SQL (MySQL, PostgreSQL) and NoSQL databases.
- Machine Learning: Familiarity with ML concepts and experience with GCP's ML services, such as Vertex AI or BigQuery ML.
- Data Visualization: Experience building reports and dashboards using Looker or Looker Studio.
- Security & Compliance: Knowledge of GCP security best practices, IAM roles, and data compliance standards (e.g., GDPR, POPIA).
- Software Engineering Practices: Understanding of CI/CD principles and experience using tools like Cloud Build and version control systems like Git.
- Problem-Solving: Excellent analytical and problem-solving skills, with the ability to work effectively in a fast-paced, agile environment.
- Communication: Strong verbal and written communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders.