Nonparametric Data Attribution for Diffusion Models (NDA)

This repository contains the official implementation for the paper "[Nonparametric Data Attribution for Diffusion Models]".

[arXiv]

Overview

Getting Started

To run the experiments, you'll need to set up the appropriate environment.

Clone the Repository:

git clone https://github.com/sail-sg/NDA.git

Create a Conda Environment:

conda create -n nda python=3.9 -y
conda activate nda

Install Dependencies:
```
pip install -r requirements.txt
```

Reproducing Experiments

We provide commands for running experiments on CIFAR-2. And it can be easily transfered to other datasets.

Navigate to the Dataset Directory:
```
cd CIFAR2
```
Prepare the Dataset: Run the 00_EDA.ipynb notebook to generate the CIFAR2 dataset and prepare the necessary subsets for LDS.
Train the Diffusion Model: Use the provided script to train a diffusion model.
```
bash scripts/run_train.sh 0 18888 5000-0.5
```
Generate Images: After training, generate images using:
```
bash scripts/run_gen.sh 0 0 5000-0.5
```
Compute Attribution Score: Use the following scripts to compute attribution scores on either the generated set (gen) or the validation set (val).

Key Parameters

t_fixed: diffusion timestep, e.g., 100/200/300/400/500.
patch_size: patch size for original scale.
topk: keep only top-K patches for each image.
Downscale / Two-scale only:
- patch_size_down and alpha: the downscaled patch size and the ratio between single-scale weight and downscaled weight.
gen_source: choose gen or val.
mask_value: penalty for invalid patch locations.

kernel_batch_size: controls GPU memory footprint—larger values are faster but use more VRAM; smaller values reduce memory usage at the cost of speed. Tune up to the largest value that does not cause OOM on your GPU.

bash scripts/run_score_orig.sh 0 5000-0.5 0 1000 100 7 100 gen 10 64
bash scripts/run_score_orig.sh 0 5000-0.5 0 1000 100 7 100 val 10 64
bash scripts/run_score_downscale.sh 0 5000-0.5 0 1000 100 14 100 gen 10 64
bash scripts/run_score_downscale.sh 0 5000-0.5 0 1000 100 14 100 val 10 64
bash scripts/run_score_twoscale.sh 0 5000-0.5 0 1000 100 14 100 10 0.5 gen 10 64
bash scripts/run_score_twoscale.sh 0 5000-0.5 0 1000 100 14 100 10 0.5 val 10 64

LDS Benchmark

Train 64 models corresponding to 64 subsets of the training set.

bash scripts/run_lds_val_sub.sh 0 18888 5000-0.5 0 63

Evaluate the model outputs on the validation set

bash scripts/run_eval_lds_val_sub.sh 0 0 5000-0.5 idx_val.pkl 0 63
bash scripts/run_eval_lds_val_sub.sh 0 1 5000-0.5 idx_val.pkl 0 63
bash scripts/run_eval_lds_val_sub.sh 0 2 5000-0.5 idx_val.pkl 0 63

Evaluate the model outputs on the generation set

bash scripts/run_eval_lds_val_sub.sh 0 0 5000-0.5 idx_gen.pkl 0 63
bash scripts/run_eval_lds_val_sub.sh 0 1 5000-0.5 idx_gen.pkl 0 63
bash scripts/run_eval_lds_val_sub.sh 0 2 5000-0.5 idx_gen.pkl 0 63

Evaluate the LDS scores Run notebooks in evaluation:
- CIFAR2/evaluation/lds_gen.ipynb
- CIFAR2/evaluation/lds_val.ipynb

Citation

If you find this project useful in your research, please consider citing our paper:

@article{zhao2025nonparametricdataattributiondiffusion,
  title={Nonparametric Data Attribution for Diffusion Models},
  author={Yutian Zhao and Chao Du and Xiaosen Zheng and Tianyu Pang and Min Lin},
  journal={arXiv preprint arXiv:2510.14269},
  year={2025}
}

Acknowledgement

This repository is partially adapted from the official implementation of Intriguing Properties of Data Attribution on Diffusion Models. We thank the authors for making their code and LDS dataset available (Code Link).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CIFAR10		CIFAR10
CIFAR2		CIFAR2
CelebA		CelebA
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nonparametric Data Attribution for Diffusion Models (NDA)

[arXiv]

Overview

Getting Started

Reproducing Experiments

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

sail-sg/NDA

Folders and files

Latest commit

History

Repository files navigation

Nonparametric Data Attribution for Diffusion Models (NDA)

[arXiv]

Overview

Getting Started

Reproducing Experiments

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages