This document aims to provide learning resources to help in training for an entry level. This list is not exhuastive and is simply to help learning some of the core concepts we have around data engineering for that level. We have given a variety of resources from articles to online courses to help with progressing towards completing these learning objectives. We have also put at the end optional certifications you can pursue to concrete your knowledge. Any comments, feedback or reports of missing/broken links please slack the cop-data channel.
If you enjoyed using these learning paths or have feedback, please use this feedback form
If you want to explore further than what is on this document then please look at the links below for further resources: Data Wiki Awesome Data Engineering
The Complete SQL Bootcamp 2022: Go from Zero to Hero (course)
How to Write Beautiful Python Code With PEP 8 (website)
Confident in writing stages of an automated CI/CD pipeline, including compiling code, unit testing, code analysis, security, and artifact creation.
AWS: Real-world CodePipeline CI/CD Examples (video)
Builds and implements a scalable data pipeline, incorporating low event latency and interactive querying, using versioning, monitoring, and testing to ensure reliability.
How to Build a Scalable Data Analytics Pipeline (website)
Data Pipeline Architecture (website)
Building Scalable Machine Learning Pipelines for Multimodal Health Data on AWS (case study)
Batch vs Real Time Data Processing (website)
Data Stream Processing Concepts and Implementations (video)
Distinguishes between a data warehouse and a data lake, assessing the relative benefits of each approach.
Data Lake vs Data Warehouse: What’s the Difference? (website)
Databricks Data Lake (website)
Uses IaC to build, change, and manage infrastructure in a safe, consistent, and repeatable way by defining resource configurations that can be versioned, reused, and shared.
Terraform explained in 15 mins (video)