Skip to content

Latest commit

 

History

History
75 lines (39 loc) · 4.24 KB

File metadata and controls

75 lines (39 loc) · 4.24 KB

Data 101 Learning Path

This document aims to provide learning resources to help in training for an entry level. This list is not exhuastive and is simply to help learning some of the core concepts we have around data engineering for that level. We have given a variety of resources from articles to online courses to help with progressing towards completing these learning objectives. We have also put at the end optional certifications you can pursue to concrete your knowledge. Any comments, feedback or reports of missing/broken links please slack the cop-data channel.

If you enjoyed using these learning paths or have feedback, please use this feedback form

If you want to explore further than what is on this document then please look at the links below for further resources: Data Wiki Awesome Data Engineering

A rudimentary understanding of SQL and traditional data storage systems

Tutorial SQL (website)

The Complete SQL Bootcamp 2022: Go from Zero to Hero (course)

Database design (video)

What is no NoSQL? (website)

OLTP vs OLAP (website)

Writes correct, clean, and testable code, accompanied by appropriate unit testing.

Python tutorial (website)

How to Write Beautiful Python Code With PEP 8 (website)

Pyspark tutorial (website)

Pytest (video)

Confident in writing stages of an automated CI/CD pipeline, including compiling code, unit testing, code analysis, security, and artifact creation.

CI/CD pipeline (website)

AWS: Real-world CodePipeline CI/CD Examples (video)

Builds and implements a scalable data pipeline, incorporating low event latency and interactive querying, using versioning, monitoring, and testing to ensure reliability.

How to Build a Scalable Data Analytics Pipeline (website)

Data Pipeline Architecture (website)

Building Scalable Machine Learning Pipelines for Multimodal Health Data on AWS (case study)

Batch vs Real Time Data Processing (website)

Data Stream Processing Concepts and Implementations (video)

Distinguishes between a data warehouse and a data lake, assessing the relative benefits of each approach.

Data Lake vs Data Warehouse: What’s the Difference? (website)

Data Warehouse (website)

Databricks Data Lake (website)

Uses IaC to build, change, and manage infrastructure in a safe, consistent, and repeatable way by defining resource configurations that can be versioned, reused, and shared.

Terraform explained in 15 mins (video)

Complete Terraform Course (video)

HashiCorp Learn (website)

IaC tool comparison (website)

Data Engineer Exams

AWS Cloud Practitioner

Azure Data Fundamentals

Databricks Lakehouse Fundamentals