Customer Churn Prediction — Multi-Channel Marketing Analytics

Samiya Islam | Brandeis University, M.S. Business Analytics | May 2026
samiyanurislam.com · LinkedIn · GitHub

Overview

This project builds an end-to-end customer churn prediction pipeline on a 5,630-customer e-commerce dataset. The core question: which customers are at risk of churning, which marketing signals predict it, and how should retention spend be allocated across customer segments?

This work directly targets the multi-channel marketing optimization problem space -- predicting and attributing customer outcomes across email, push notification, social advertising, and retargeting channels.

Dataset

A realistic synthetic e-commerce customer dataset (5,630 rows, 26 features) modeled on the Kaggle E-Commerce Customer Churn benchmark. Key feature groups:

Category	Features
Behavioral	Tenure, OrderCount, DaySinceLastOrder, CouponUsed, CashbackAmount
Satisfaction	SatisfactionScore, Complain
Demographics	Gender, MaritalStatus, CityTier, NumberOfAddress
Marketing channels	EmailOpensLast30Days, EmailClicksLast30Days, PushNotifClicked, SocialAdClicked, RetargetingExposed, AcquisitionChannel
Engineered	EmailEngagementRate, MultiChannelEngagement

Churn rate: ~12.3% (class imbalance handled via class_weight='balanced' + PR-curve threshold tuning)

Methodology

1. Exploratory Data Analysis

Churn rate by acquisition channel (Paid Search and Social highest risk)
Distribution comparisons across marketing and behavioral features
Correlation analysis of engagement signals vs. churn

2. Preprocessing

Label encoding for 6 categorical features
Feature engineering: EmailEngagementRate (clicks/opens), MultiChannelEngagement (composite score)
80/20 stratified train-test split

3. Model Training (class-imbalance corrected)

Model	AUC	F1	Precision	Recall
Logistic Regression	0.739	0.369	0.260	0.633
Random Forest	0.708	0.334	0.212	0.791
CatBoost	0.702	0.349	0.245	0.604
XGBoost	0.628	0.288	0.195	0.547

Thresholds tuned via Precision-Recall curve (not defaulted to 0.5).

4. Interpretability (SHAP)

SHAP TreeExplainer applied to CatBoost model. Top predictors:

Tenure -- newer customers are dramatically higher risk
DaySinceLastOrder -- recency is a strong leading indicator
SatisfactionScore -- low scores predict churn before cancellation
EmailClicksLast30Days -- email engagement is protective
Complain -- complaint history raises churn probability materially

5. Customer Segmentation (K-Means, k=4)

Four actionable customer personas identified, each with a distinct retention strategy (win-back, re-engagement, email nurture, loyalty program).

Key Business Recommendations

Front-load retention spend on customers in their first 6 months (Tenure is the #1 churn driver)
Automate inactivity triggers: email at Day 14, push at Day 21 of no order activity
Complaint → retention flag: route complaining customers to priority handling
Email engagement is measurable and actionable: A/B testing subject lines and send-time optimization has a demonstrable impact on churn probability
Deploy Logistic Regression for production batch scoring (highest AUC, interpretable coefficients, low maintenance)

Files

├── customer_churn_marketing_analytics.ipynb   # Main analysis notebook
├── ecommerce_churn.csv                        # Dataset
├── model_metrics.csv                          # Model comparison table
├── segment_summary.csv                        # Segment profiles
├── figures/
│   ├── 01_eda.png                             # Exploratory analysis
│   ├── 02_roc.png                             # ROC curves
│   ├── 03_model_comparison.png                # Performance comparison
│   ├── 04_shap.png                            # SHAP feature importance
│   └── 05_segments.png                        # Customer segment profiles
└── README.md

Tech Stack

Python · scikit-learn · CatBoost · XGBoost · SHAP · pandas · matplotlib · seaborn · KMeans

About This Project

Built to demonstrate marketing data science capabilities aligned with multi-channel customer analytics -- specifically: clustering/segmentation, boosted tree modeling, cross-channel outcome attribution, and communicating complex model outputs to non-technical stakeholders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction — Multi-Channel Marketing Analytics

Overview

Dataset

Methodology

1. Exploratory Data Analysis

2. Preprocessing

3. Model Training (class-imbalance corrected)

4. Interpretability (SHAP)

5. Customer Segmentation (K-Means, k=4)

Key Business Recommendations

Files

Tech Stack

About This Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
01_eda.png		01_eda.png
02_roc.png		02_roc.png
03_model_comparison.png		03_model_comparison.png
04_shap.png		04_shap.png
05_segments.png		05_segments.png
README.md		README.md
index.html		index.html
model_metrics.csv		model_metrics.csv
segment_summary.csv		segment_summary.csv

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction — Multi-Channel Marketing Analytics

Overview

Dataset

Methodology

1. Exploratory Data Analysis

2. Preprocessing

3. Model Training (class-imbalance corrected)

4. Interpretability (SHAP)

5. Customer Segmentation (K-Means, k=4)

Key Business Recommendations

Files

Tech Stack

About This Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages