
Senior Data Engineer
- Durban, KwaZulu-Natal
- Permanent
- Full-time
- Design, develop, and maintain the end-to-end technical aspects (hardware - on prem and/or cloud) of a high performance compute and storage platform designed specifically for data pipelines in a research data intensive environment
- Design, develop, and maintain the end-to-end data pipelines (software, DBs and processes) required to support the research scientists and data managers.
- Support ETL processes including, data ingestion, transformation, validation, and integration processes using various tools and frameworks
- Optimize data performance, scalability, and security
- Provide technical guidance and support to data analysts and research scientists.
- Design data integrations and data quality frameworks
- Work and collaborate with the rest of the IT department to help develop the strategy for long term scientific Big Data platform architecture
- Document and effectively communicate data engineering processes and solutions.
- Make use of and help define the right cutting edge technology, processes and tools needed to drive technology within our science and research data management departments.
- Bachelor's degree or higher in Computer Science, IT, Engineering, Mathematics, or a related field
- Industry recognized IT related certification and technology qualification such as Databases and Data related certifications.
- This is a technical role so a strong focus needs to be on technical skills and experience
- 7+ years’ experience in a Data Engineering, High Performance Computing, Data Warehousing, Big Data Processing
- Strong experience with technologies such as Hadoop, Kafka, Nifi or Spark or Cloud-based big data processing environments like Amazon Redshift, Google BigQuery and Azure Synapse Analytics
- At least 5 years’ advanced experience and very strong proficiency in UNIX, Linux, Windows Operating systems and preferably containerization technologies like Docker, Kubernetes etc
- Working knowledge of various data related programming, scripting or data engineering tools such as Python, R, Julia, T-SQL, PowerShell, etc.
- Strong working experience with software compute and virtualization platforms such as VMware, Hyper-V, OpenStack, KVM etc.
- Strong working experience with hardware compute platforms including high performance compute cluster hardware and related technologies.
- Strong Experience working with various relational database technologies like MS SQL, MySQL, PostgreSQL as well as NoSQL databases such as MongoDB, Cassandra etc.
- Experience of Big Data technologies such as Hadoop, Spark and Hive
- Experience with data pipeline tools such as Airflow, Spark, Kafka, or Dataflow
- Experience working with containerization is advantageous
- Experience with data quality and testing tools such as Great Expectations, dbt, or DataGrip is advantageous
- Experience working with Big Data Cloud based (AWS, Azure etc) technologies is advantageous
- Experience with data warehouse and data lake technologies such as BigQuery, Redshift, or Snowflake advantageous
- Strong Experience designing end-to-end data pipelines including the compute hardware infrastructure.
- Strong knowledge of data modeling, architecture, and governance principles
- Strong Linux Administration skills
- Programming skills in various languages advantageous
- Strong data security and compliancy experience
- Excellent communication, collaboration, and problem-solving skills
- Ability to work independently and as part of a cross-functional team
- Interest and enthusiasm for medical scientific research and its applications