Skip to content

sebasmos/curious-qmoe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

92 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

arXiv LICENSE Python Version

Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts

curious-qmoe is a curiosity-driven quantized Mixture-of-Experts framework for efficient audio classification on resource-constrained edge devices. curious-qmoe achieves 99.9% of full-precision accuracy with 4Γ— compression and 82% latency variance reduction through Bayesian epistemic uncertainty-based routing.

Key Features:

  • Heterogeneous Quantization: BitNet ternary, BitLinear (1-16 bit), post-training quantization (PTQ) with bitwise operations
  • Curiosity-Driven Routing: Bayesian router with Monte Carlo dropout for epistemic uncertainty estimation
  • Mixture-of-Experts: Dynamic expert selection across quantized models for adaptive precision
  • Hardware-Efficient: Optimized for edge deployment with predictable latency (29 ms std)
  • Comprehensive Evaluation: Energy consumption, carbon emissions, and statistical significance testing
  • Reproducible: Hydra configuration management, cross-validation, experiment tracking

Datasets: ESC-50, Quinn, UrbanSound8K


Setup

conda create -n curious-qmoe python=3.11 -y
conda activate curious-qmoe
git clone https://github.com/sebasmos/QWave.git
cd QWave
pip install -e .

Quick Start

Basic Usage

cd scripts
python benchmark.py \
  --config-path /path/to/curious-qmoe/config \
  --config-name esc50 \
  experiment.datasets.esc.csv=/path/to/esc-50.csv \
  experiment.device=cpu \
  experiment.models_to_run=[esc]

MoE with Curiosity Mode

python benchmark.py \
  --config-path /path/to/curious-qmoe/config \
  --config-name esc50 \
  experiment.device=cpu \
  experiment.datasets.esc.csv=/path/to/esc-50.csv \
  experiment.models_to_run=[moe] \
  experiment.router.expert_quantizations="[bitnet,'1','2','4','8','16',qesc]" \
  experiment.router.num_experts=3 \
  experiment.router.top_k=1 \
  experiment.router.use_curiosity=true \
  experiment.metadata.tag=esc_moe_curiosity

Curiosity outputs (saved per fold):

  • curiosity_values.json - Raw uncertainty values
  • curiosity_histogram.png - Distribution of epistemic uncertainty
  • curiosity_per_class.png - Average uncertainty per class

Project Structure

curious-qmoe/
β”œβ”€β”€ config/                    # Hydra configs
β”‚   └── esc50.yaml             # ESC-50 configuration
β”œβ”€β”€ curious_qmoe/              # Core source code
β”‚   β”œβ”€β”€ datasets.py            # EmbeddingDataset and normalization
β”‚   β”œβ”€β”€ models.py              # Neural architectures (MLP, ESCModel)
β”‚   β”œβ”€β”€ bitnnet.py             # BitNet quantized layers
β”‚   β”œβ”€β”€ qmoe_layers.py         # Quantized MoE layers
β”‚   β”œβ”€β”€ moe.py                 # MoE training and Bayesian Router
β”‚   β”œβ”€β”€ train_utils.py         # Training/validation utilities
β”‚   β”œβ”€β”€ memory.py              # Model size calculation
β”‚   β”œβ”€β”€ graphics.py            # Plotting (ROC, losses, curiosity)
β”‚   └── utils.py               # Helpers (seeding, device, metrics)
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ benchmark.py           # Main benchmarking pipeline
β”‚   └── tables/                # Results analysis scripts
β”‚       β”œβ”€β”€ analyze-std.py           # Generate tables with meanΒ±std
β”‚       β”œβ”€β”€ analyze-significance.py  # Statistical testing (t-tests, Levene)
β”‚       └── README-significance.md   # Model nomenclature reference
β”œβ”€β”€ outputs/                   # Auto-generated results
└── pyproject.toml

Results Analysis

After running experiments, analyze results with the scripts in scripts/tables/:

1. Verify Paper Numbers

Deterministic verification of all MoE-related numbers against paper tables:

python scripts/tables/verify_paper_tables.py

Output: Per-value comparison with OK/MISMATCH status for Tables 1, 3, 4, 5, and prose claims.

2. Generate Tables (meanΒ±std)

Create 5 tables with meanΒ±std from 5-fold cross-validation:

cd scripts/tables
python analyze-std.py

Output: tables-std/ folder with 5 CSV files:

  • table1-cross_dataset_performance_std.csv (Table 1)
  • table2-ablation_quantization_std.csv (Table 2)
  • table3-moe_curiosity_std.csv (Table 3)
  • table4-inference_latency_std.csv (Table 4)
  • supplementary-carbon_emissions_std.csv (Supplementary)

3. Statistical Significance Testing

Run paired t-tests and variance tests:

cd scripts/tables
python analyze-significance.py

Output: significance-tests/ folder with 6 CSV files:

  • F1-score comparisons (Tables 1-3)
  • Latency speedup tests (Table 4)
  • Energy efficiency tests (Table 3)
  • Variance reduction analysis (Levene's test)

Model nomenclature: See scripts/tables/README-significance.md for standardized names (FP32-Base, Q8-Base-PTQ, etc.)


Config Overview

Key parameters in config/esc50.yaml:

experiment:
  models_to_run: [esc]  # Options: esc, bitnet, moe, qmoe, 1, 2, 4, 8, 16, qesc
  device: "cpu"  # or "cuda", "mps"

  datasets:
    esc:
      csv: "/path/to/esc-50.csv"
      normalization_type: "standard"

  model:
    batch_size: 64
    hidden_sizes: [640, 320]
    learning_rate: 0.0005793146438537801
    epochs: 10

  router:  # For MoE models
    expert_quantizations: [1, 2, 4, 16]
    num_experts: 4
    top_k: 1
    use_curiosity: false  # Enable Bayesian Router
    load_balancing: true

  cross_validation:
    n_splits: 5
    shuffle: true
    random_seed: 42

Supported schemes:

  • 1-bit to 16-bit: Symmetric quantization with scale factors
  • BitNet: Ternary weights {-1, 0, 1} with per-channel scaling
  • qesc: Bitwise popcount with 2-bit ternary encoding

Reproducing ECCV 2026 Paper Results

All MoE-C (curiosity routing) experiments use corrected Eq. 8 implementation. Results are in outputs/ and docs-temp/eccv-paper/results/.

Datasets

  • ESC-50: /path/to/ESC-50/efficientnet_1536/esc-50.csv
  • Quinn: /path/to/Quinn/efficientnet_ABGQI/ABGQI_embeddings_torch.csv
  • UrbanSound8K: /path/to/Urban8k/efficientnet/urbansound8k.csv

3-Expert MoE-C (BitNet-Q4/8-QMoE-C, Table 1 main config)

# Baseline (no curiosity) β€” per dataset
python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', 4, 8]" \
  experiment.router.num_experts=3 experiment.router.use_curiosity=false \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_baseline

# Curiosity Ξ±=0.2
python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', 4, 8]" \
  experiment.router.num_experts=3 experiment.router.use_curiosity=true \
  experiment.router.curiosity_strategy=kl_divergence \
  experiment.router.curiosity_alpha=0.2 experiment.router.mc_samples=10 \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_alpha_020

# Curiosity Ξ±=0.3
python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', 4, 8]" \
  experiment.router.num_experts=3 experiment.router.use_curiosity=true \
  experiment.router.curiosity_strategy=kl_divergence \
  experiment.router.curiosity_alpha=0.3 experiment.router.mc_samples=10 \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_alpha_030

4-Expert MoE-C (BitNet-Q4/8/16-QMoE-C)

python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', 4, 8, 16]" \
  experiment.router.num_experts=4 experiment.router.use_curiosity=true \
  experiment.router.curiosity_strategy=kl_divergence \
  experiment.router.curiosity_alpha=0.3 experiment.router.mc_samples=10 \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_bitnet_4_8_16_curiosity

PTQ MoE-C Variants

# BitNet-Q8/16-PTQ-QMoE-C (4 experts: bitnet, 8, 16, qesc)
python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', '8', '16', 'qesc']" \
  experiment.router.num_experts=4 experiment.router.use_curiosity=true \
  experiment.router.curiosity_strategy=kl_divergence \
  experiment.router.curiosity_alpha=0.3 experiment.router.mc_samples=10 \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_bitnet_8_16_qesc_curiosity

# BitNet-Q8PTQ-QMoE-C (2 experts: bitnet, qesc)
python scripts/qMoE/qmoe.py \
  --config-path /path/to/curious-qmoe/config --config-name esc50 \
  "experiment.router.expert_quantizations=['bitnet', 'qesc']" \
  experiment.router.num_experts=2 experiment.router.use_curiosity=true \
  experiment.router.curiosity_strategy=kl_divergence \
  experiment.router.curiosity_alpha=0.3 experiment.router.mc_samples=10 \
  experiment.datasets.esc.csv=/path/to/dataset.csv \
  experiment.device=cpu experiment.metadata.tag=<dataset>_bitnet_qesc_curiosity

Generate Rebuttal Figure

python scripts/analysis/generate_rebuttal_figure.py \
  --routing-json outputs-rebuttal/analysis/outputs-0.2/rebuttal_routing/routing_results.json \
  --output docs-temp/eccv-paper/figs/rebuttal_confidence_distribution.pdf

Output Tags β†’ Paper Model Names

Tag pattern Paper name
*_baseline Baseline MoE (uniform routing)
*_alpha_020 BitNet-Q4/8-QMoE-C (Ξ±=0.2)
*_alpha_030 BitNet-Q4/8-QMoE-C (Ξ±=0.3)
*_bitnet_4_8_16_curiosity BitNet-Q4/8/16-QMoE-C
*_bitnet_8_16_qesc_curiosity BitNet-Q8/16-PTQ-QMoE-C
*_bitnet_qesc_curiosity BitNet-Q8PTQ-QMoE-C

Each experiment outputs summary.json with f1_mean, f1_std, per-fold results, carbon emissions, and timing data.


License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).


Citation

@article{ordonez2025uncertainty,
  title={Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts},
  author={Ord{\'o}{\~n}ez, Sebasti{\'a}n Andr{\'e}s Cajas and Torres, Luis Fernando Torres and Meni, Mackenzie J and Paredes, Carlos Andr{\'e}s Duran and Arazo, Eric and Bosch, Cristian and Carbajo, Ricardo Simon and Lai, Yuan and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2511.11743},
  year={2025}
}

About

πŸ”¬ Curiosity-Driven Quantized Mixture of Experts

Topics

Resources

License

Stars

Watchers

Forks

Contributors