Module: CM2606 Data Engineering
Deadline: 18th April 2025
Submission:
- Report
- Video demonstration (for Question 1)
Key Skills Developed:
β
Building ETL pipelines with Airflow
β
Data modeling (Star Schema)
β
Cloud security & compliance (HIPAA/GDPR)
Data_Engineering_Coursework/
βββ CM2606-CW-2025.pdf # Original coursework brief
βββ Data Engineering CW Report.pdf # Submitted report (contains all answers)
βββ README.md # This overview file
Click image to watch the AWS Airflow implementation
Components Implemented:
- OpenWeather API extraction
- Data transformation to Parquet/CSV
- S3 loading with Airflow DAGs
- Automated scheduling
Star Schema Components:
- Dimension Tables
- Fact Table
- Aggregate Tables
*Refer to report for diagram and other details *
- Implement strong authentication (MFA/OAuth)
- Design granular authorization using:
- Role-Based Access Control (RBAC) or
- Attribute-Based Access Control (ABAC)
- At-rest encryption: AES-256 standard
- In-transit encryption: TLS 1.3
- Sensitive data handling:
- Column-level security for PII
- Dynamic data masking
- Implement audit trails (e.g., AWS CloudTrail)
- Establish data retention policies
- Real-time monitoring (SIEM solutions)
- Secure API gateway implementation
- Token-based authentication
- Rate limiting mechanisms
- Aggregated data exposure only
Submitted by: [Loganthan Thusharkanth] (Student ID: [20233168])
