Point-of-Interest recommendation via GeoMF (Geographical Matrix Factorization)
Supports PyTorch mini-batch BPR training, multi-negative sampling, and learning rate scheduling.
# 1. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txtNote: This project runs on CPU by default (no GPU required).
- Go to the Yelp Academic Dataset page:
https://www.yelp.com/dataset - Download the latest
yelp_academic_dataset.zip. - Unzip and copy the JSON files into
data/raw/:unzip yelp_academic_dataset.zip -d data/raw/
- Since,
yelp_academic_checkin.jsonandyelp_academic_tips.jsonhave similar meaning to our project, we drop theyelp_academic_checkin.json
Place the Yelp JSON files under data/raw/:
data/raw/
├── business.json
├── review.json
├── checkin.json # optional
├── user.json
└── tip.json
-
Construct interaction matrices
python src/data/inter_matrix.py \ --review data/raw/review.json \ --checkin data/raw/checkin.json \ --out-dir data/processed
Generates
R.npz(user×item ratings) andW.npz(weight matrix). -
Generate geographic grid and influence matrix
Adjustconfig/default.yamlgrid parameters as needed:grid: delta_lat: 0.1 delta_lon: 0.1 sigma: 0.2 radius: 3
Then run:
python src/data/grid.py \ --biz_json data/raw/business.json \ --biz2idx data/processed/biz2idx.json \ --config config/default.yaml \ --out_dir data/processed python src/features/influence.py \ --biz2idx data/processed/biz2idx.json \ --centers data/processed/grid_centers.npy \ --biz2grid data/processed/biz2grid.npy \ --config config/default.yaml \ --out data/processed/Y.npz
Optional: For subset compression, use
src/data/compress_Y.py. -
Split into train/test sets
python src/data/split.py \ --R data/processed/R.npz \ --W data/processed/W.npz \ --out_dir data/processed \ --test_ratio 0.3 \ --seed 123
Produces
R_train.npz,R_test.npz, andW_train.npz.
The project now uses a two-phase hybrid training:
- Strict GeoMF (ALS + projected gradient) as per the paper.
- BPR fine-tuning (mini-batch, multi-negative sampling) for ranking.
Use train.py with the following options:
| Argument | Description | Default |
|---|---|---|
--sample_users |
Subsample first N users for debugging | — |
--sample_items |
Subsample first N items for debugging | — |
--K |
Latent factor dimension | 50 |
--max_iter |
Alternating ALS+PG iterations for strict GeoMF | 20 |
--gamma |
L2 regularization coefficient (γ) for P and Q | 0.01 |
--lam |
L1 regularization coefficient (λ) for X | 0.1 |
--eta |
Learning rate for X projected gradient | 1e-3 |
| BPR fine-tuning | ||
--bpr_epochs |
Number of BPR fine-tuning epochs | 5 |
--bpr_lr |
Learning rate for BPR optimizer | 1e-3 |
--bpr_neg |
Negative samples per user in BPR | 5 |
--bpr_batch |
Batch size for BPR DataLoader | 256 |
--bpr_workers |
Number of DataLoader worker processes | 4 |
Example (5k×5k subset, hybrid training):
python train.py \
--sample_users 5000 \
--sample_items 5000 \
--K 100 \
--max_iter 20 \
--gamma 0.01 \
--lam 0.1 \
--eta 1e-3 \
--bpr_epochs 5 \
--bpr_lr 1e-3 \
--bpr_neg 5 \
--bpr_batch 512 \
--bpr_workers 4The script first runs strict GeoMF (ALS + PG with tqdm progress bar), then performs BPR fine-tuning using a multi-worker DataLoader for efficient negative sampling and gradient updates.
Use evaluate.py with options:
python evaluate.py \
--sample_users 2000 \
--sample_items 2000 \
--K 5 10 20 50 100Outputs Recall@K and Precision@K for specified K values.
- Grid Search on a mid-scale subset (e.g., 2000×2000):
K ∈ {50,100,200}, γ ∈ {1e-3,1e-2,1e-1}, λ ∈ {1e-2,1e-1,1}, η ∈ {1e-4,1e-3} - Compress Y for each subset via
src/data/compress_Y.py. - Increase num_negatives for stronger ranking signals.
- Adjust batch_size to balance noise and memory.
- Two-stage retrieval: coarse P·Q retrieval → GeoMF-BPR reranking.
GeoMF-Rec/
├── README.md
├── config/
│ └── default.yaml
├── data/
│ ├── raw/
│ └── processed/
├── src/
│ ├── data/
│ ├── features/
│ └── model/
├── train.py
├── evaluate.py
└── requirements.txt
For issues or questions, please open an issue or contact the maintainer.
zhenye2@ualberta.ca
This project is licensed under the MIT License. See the LICENSE file for details. zhenye2@ualberta.ca