This repository contains comprehensive examples and tutorials for the Big Data and Visualisation module at MK:U (Milton Keynes University). The project demonstrates various big data processing techniques using different platforms and tools, including Apache Spark, MongoDB, and various visualisation libraries.
Course Information: MK:U Apprenticeships - Big Data and Visualisation
This repository is organised into several key directories, each focusing on different aspects of big data processing and visualisation:
Interactive Jupyter notebooks designed to run in Google Colab environment, featuring:
- 8 notebooks covering Spark data processing, environmental data analysis, geographic mapping, API integration, MongoDB operations, and chart creation
- See
Colab/README.mdfor complete documentation
Specialised notebooks for Azure HDInsight clusters, including:
- 2 notebooks demonstrating Spark-based data processing and enterprise-grade analytics workflows
- See
HDInsight/README.mdfor complete documentation
Local Python development environment featuring:
- 2 files: Main MongoDB integration script (
access-mongo.py) and cursor prompts guide (pyMongo_cursor_prompts.md) - MongoDB database operations, noise mapping data analysis, and database querying
- See
Python/README.mdfor complete documentation
Apache Zeppelin notebook examples for:
- 7 files (4 Jupyter notebooks and 3 native Zeppelin format files) covering interactive data analysis, real-time processing, and property market analysis
- See
Zeppelin/README.mdfor complete documentation
- Single-page interactive recap: Spark / big-data concepts as a chain game (
docs/index.html). - Public URL (after GitHub Pages is switched on): https://rendzina.github.io/BigDataAndVisualisation/
In the repo: Settings → Pages → Build and deployment → Branch main, folder /docs, then save. The site can take a minute to appear.
- Multi-Platform Support: Examples for Google Colab, Azure HDInsight, and local development
- Real-World Data: Practical examples using environmental, property, and fuel price datasets
- Interactive Visualisations: Maps, charts, and graphs using various plotting libraries
- Database Integration: MongoDB operations and data persistence
- API Integration: Real-time data fetching and processing
- Educational Focus: Step-by-step tutorials with comprehensive documentation
This repository contains:
- 8 Google Colab notebooks for cloud-based data processing
- 2 Azure HDInsight notebooks for enterprise big data analytics
- 2 Python scripts for MongoDB integration and local development
- 7 Zeppelin notebooks (Jupyter and native formats) for interactive data analysis
- Python 3.7+: Required for local development
- MongoDB: For database examples (local installation)
- Google Colab Account: For cloud-based notebooks
- Azure Subscription: For HDInsight examples (optional)
- Apache Zeppelin: For Zeppelin notebook examples (optional)
-
For Google Colab:
- Navigate to the
Colab/directory - Open notebooks directly in Google Colab
- See
Colab/README.mdfor detailed instructions
- Navigate to the
-
For Local Development:
- Set up the Python environment in the
Python/directory - Install required packages:
pip install pymongo pandas - See
Python/README.mdfor setup instructions
- Set up the Python environment in the
-
For Azure HDInsight:
- Use notebooks from the
HDInsight/directory - Requires an active Azure subscription
- See
HDInsight/README.mdfor cluster setup
- Use notebooks from the
-
For Zeppelin:
- Import notebooks from the
Zeppelin/directory - Requires a running Zeppelin server
- See
Zeppelin/README.mdfor configuration
- Import notebooks from the
This project supports learning objectives in:
- Big Data Processing: Apache Spark, data transformation, and analysis
- Data Visualisation: Creating meaningful charts, graphs, and maps
- Database Operations: MongoDB integration and querying
- Cloud Computing: Working with cloud-based big data platforms
- Real-Time Data: API integration and streaming data processing
This is an educational project designed for students at MK:U. Contributions that enhance learning outcomes are welcome, including:
- Additional examples and tutorials
- Improved documentation
- Bug fixes and code improvements
- New visualisation techniques
This project is for educational purposes. Please ensure you have appropriate permissions for any external data sources used.
Originally written by S. Hallett and updated by A. Khouakhi.
Course: MK:U, Big Data and Visualisation
Date: 29/10/2025
This project uses UK spelling conventions throughout and follows PEP 8 coding standards for Python code.