Big Data and Visualisation Project

Overview

This repository contains comprehensive examples and tutorials for the Big Data and Visualisation module at MK:U (Milton Keynes University). The project demonstrates various big data processing techniques using different platforms and tools, including Apache Spark, MongoDB, and various visualisation libraries.

Course Information: MK:U Apprenticeships - Big Data and Visualisation

Project Structure

This repository is organised into several key directories, each focusing on different aspects of big data processing and visualisation:

📁 Colab/ - Google Colaboratory Notebooks

Interactive Jupyter notebooks designed to run in Google Colab environment, featuring:

8 notebooks covering Spark data processing, environmental data analysis, geographic mapping, API integration, MongoDB operations, and chart creation
See Colab/README.md for complete documentation

📁 HDInsight/ - Microsoft Azure HDInsight Notebooks

Specialised notebooks for Azure HDInsight clusters, including:

2 notebooks demonstrating Spark-based data processing and enterprise-grade analytics workflows
See HDInsight/README.md for complete documentation

📁 Python/ - MongoDB and Python Integration

Local Python development environment featuring:

2 files: Main MongoDB integration script (access-mongo.py) and cursor prompts guide (pyMongo_cursor_prompts.md)
MongoDB database operations, noise mapping data analysis, and database querying
See Python/README.md for complete documentation

📁 Zeppelin/ - Apache Zeppelin Notebooks

Apache Zeppelin notebook examples for:

7 files (4 Jupyter notebooks and 3 native Zeppelin format files) covering interactive data analysis, real-time processing, and property market analysis
See Zeppelin/README.md for complete documentation

docs/ — Unit 2 recap (static game for students)

Single-page interactive recap: Spark / big-data concepts as a chain game (docs/index.html).
Public URL (after GitHub Pages is switched on): https://rendzina.github.io/BigDataAndVisualisation/
In the repo: Settings → Pages → Build and deployment → Branch main, folder /docs, then save. The site can take a minute to appear.

Key Features

Multi-Platform Support: Examples for Google Colab, Azure HDInsight, and local development
Real-World Data: Practical examples using environmental, property, and fuel price datasets
Interactive Visualisations: Maps, charts, and graphs using various plotting libraries
Database Integration: MongoDB operations and data persistence
API Integration: Real-time data fetching and processing
Educational Focus: Step-by-step tutorials with comprehensive documentation

Repository Contents

This repository contains:

8 Google Colab notebooks for cloud-based data processing
2 Azure HDInsight notebooks for enterprise big data analytics
2 Python scripts for MongoDB integration and local development
7 Zeppelin notebooks (Jupyter and native formats) for interactive data analysis

Getting Started

Prerequisites

Python 3.7+: Required for local development
MongoDB: For database examples (local installation)
Google Colab Account: For cloud-based notebooks
Azure Subscription: For HDInsight examples (optional)
Apache Zeppelin: For Zeppelin notebook examples (optional)

Quick Start

For Google Colab:
- Navigate to the Colab/ directory
- Open notebooks directly in Google Colab
- See Colab/README.md for detailed instructions
For Local Development:
- Set up the Python environment in the Python/ directory
- Install required packages: pip install pymongo pandas
- See Python/README.md for setup instructions
For Azure HDInsight:
- Use notebooks from the HDInsight/ directory
- Requires an active Azure subscription
- See HDInsight/README.md for cluster setup
For Zeppelin:
- Import notebooks from the Zeppelin/ directory
- Requires a running Zeppelin server
- See Zeppelin/README.md for configuration

Educational Objectives

This project supports learning objectives in:

Big Data Processing: Apache Spark, data transformation, and analysis
Data Visualisation: Creating meaningful charts, graphs, and maps
Database Operations: MongoDB integration and querying
Cloud Computing: Working with cloud-based big data platforms
Real-Time Data: API integration and streaming data processing

Contributing

This is an educational project designed for students at MK:U. Contributions that enhance learning outcomes are welcome, including:

Additional examples and tutorials
Improved documentation
Bug fixes and code improvements
New visualisation techniques

License

This project is for educational purposes. Please ensure you have appropriate permissions for any external data sources used.

Author

Originally written by S. Hallett and updated by A. Khouakhi.
Course: MK:U, Big Data and Visualisation
Date: 29/10/2025

This project uses UK spelling conventions throughout and follows PEP 8 coding standards for Python code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data and Visualisation Project

Overview

Project Structure

📁 Colab/ - Google Colaboratory Notebooks

📁 HDInsight/ - Microsoft Azure HDInsight Notebooks

📁 Python/ - MongoDB and Python Integration

📁 Zeppelin/ - Apache Zeppelin Notebooks

docs/ — Unit 2 recap (static game for students)

Key Features

Repository Contents

Getting Started

Prerequisites

Quick Start

Educational Objectives

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
Colab		Colab
HDInsight		HDInsight
Minard illustration		Minard illustration
Python		Python
Zeppelin		Zeppelin
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Big Data and Visualisation Project

Overview

Project Structure

📁 Colab/ - Google Colaboratory Notebooks

📁 HDInsight/ - Microsoft Azure HDInsight Notebooks

📁 Python/ - MongoDB and Python Integration

📁 Zeppelin/ - Apache Zeppelin Notebooks

docs/ — Unit 2 recap (static game for students)

Key Features

Repository Contents

Getting Started

Prerequisites

Quick Start

Educational Objectives

Contributing

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages