Machine Learning from Scratch

Building fundamental machine learning algorithms from the ground up using Python and NumPy.

Project Overview

This repository is an educational journey into the inner workings of machine learning. By implementing these algorithms from scratch, I aim to move beyond "black-box" libraries like Scikit-Learn or TensorFlow and gain a deep, mathematical understanding of how these models truly function.

The focus is on clarity, mathematical rigor, and efficient implementation using vectorized operations in NumPy.

Algorithms Implemented

Supervised Learning: Regression

Linear Regression: Simple linear modeling using gradient descent.
Decision Tree (Regression): Tree-based regression splitting on MSE reduction.
Gradient Boosting Machine (Regression): Ensemble technique using additive training for regression.

Supervised Learning: Classification

Logistic Regression: Binary classification using the sigmoid function and cross-entropy loss.
Decision Tree (Classification): Classification tree using Gini impurity or entropy for splits.
Random Forest: Ensemble method using bagging and random feature selection.
Adaptive Boosting (AdaBoost): Boosting method focusing on misclassified samples.
Gradient Boosting Machine (Classification): GBM implementation adapted for binary classification.
XGBoost: eXtreme Gradient Boosting with regularization and Taylor expansion of the loss function.

Deep Learning

(Fully Connected) Neural Network: Multi-layer perceptron with backpropagation and ReLU/Softmax activations. Tested on MNIST.
Convolutional Neural Network: CNN implementation featuring convolution, pooling, and flattening layers. Tested on Fashion-MNIST.

Unsupervised Learning

K-Means Clustering: Classic centroid-based clustering algorithm.

Key Features

NumPy-First: All core logic is implemented using NumPy for efficient numerical and matrix operations.
Consistent API: Models generally follow a consistent fit(X, y) and predict(X) interface, making them easy to test and swap.
Pure Implementation: Avoids high-level machine learning frameworks for the model logic itself.
Visualization: Many implementations include loss curves or result visualizations using Matplotlib.

Getting Started

Prerequisites

Python 3.x
NumPy
Pandas (for data loading)
Matplotlib (for visualization)
Scikit-Learn (only for data preprocessing utilities like train_test_split)
idx2numpy (for MNIST-style datasets)

Installation

Clone the repository:

git clone https://github.com/zjzhao1002/Machine-Learning-from-Scratch.git
cd Machine-Learning-from-Scratch

Install dependencies:
```
pip install -r requirements.txt
```

Running an Example

Each algorithm is isolated in its own directory with a main.py script demonstrating its usage.

cd XGBoost
python main_classification.py

Project Structure

Machine-Learning-from-Scratch/
├── AdaBoost/                      # AdaBoost implementation
├── Convolutional_Neural_Network/  # CNN from scratch
├── Decision_Tree_Classifier/      # Tree-based classification
├── KMeans/                        # K-Means clustering
├── Linear_Regression/             # Simple linear regression
├── Neural_Network/                # Fully connected ANN
└── ...                            # Other algorithms

Each directory typically contains:

<Algorithm>.py: The model implementation.
main.py: Demo script.
README.md: Detailed documentation and mathematical derivation for that specific algorithm.

Future Roadmap / TODO

To further expand the scope of this project and deepen the understanding of machine learning, the following models and features are planned for implementation:

Support Vector Machines (SVM): Implementing the dual optimization problem and kernel trick.
K-Nearest Neighbors (KNN): A simple but powerful distance-based algorithm.
Naive Bayes: Probabilistic classification based on Bayes' Theorem.
Principal Component Analysis (PCA): Dimensionality reduction using Eigen-decomposition or SVD.
Recurrent Neural Networks (RNN/LSTM): Handling sequential data and time-series analysis.
DBSCAN: Density-based spatial clustering of applications with noise.
Optimization Algorithms: Implementing advanced optimizers like Adam, RMSprop, and Adagrad for neural networks.

Related Projects

Generative Pre-Trained Transformer (GPT) from Scratch: An implementation of a GPT-style transformer model.

References & Inspiration

Neural Networks and Deep Learning by Michael Nielsen.
Understanding Deep Learning by Simon J.D. Prince.
XGBoost Official Documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
AdaBoost		AdaBoost
Convolutional_Neural_Network		Convolutional_Neural_Network
Decision_Tree_Classifier		Decision_Tree_Classifier
Decision_Tree_Regressor		Decision_Tree_Regressor
Gradient_Boosting_Classifier		Gradient_Boosting_Classifier
Gradient_Boosting_Regressor		Gradient_Boosting_Regressor
KMeans		KMeans
Linear_Regression		Linear_Regression
Logistic_Regression		Logistic_Regression
Neural_Network		Neural_Network
Random_Forest		Random_Forest
XGBoost		XGBoost
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning from Scratch

Project Overview

Algorithms Implemented

Supervised Learning: Regression

Supervised Learning: Classification

Deep Learning

Unsupervised Learning

Key Features

Getting Started

Prerequisites

Installation

Running an Example

Project Structure

Future Roadmap / TODO

Related Projects

References & Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning from Scratch

Project Overview

Algorithms Implemented

Supervised Learning: Regression

Supervised Learning: Classification

Deep Learning

Unsupervised Learning

Key Features

Getting Started

Prerequisites

Installation

Running an Example

Project Structure

Future Roadmap / TODO

Related Projects

References & Inspiration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages