Train a DDSP (Differentiable Digital Signal Processing) model on your own audio, export its learned spectral controls, and sonify / transform them in SuperCollider through a graphical control panel.
This repo bridges Google's DDSP training stack and a custom SuperCollider performance environment: DDSP learns how a sound is built (harmonics + filtered noise); Python extracts and reshapes that data; SuperCollider plays it back with real-time pitch, stretch, bounce, and mix controls via GUI.
DDSP (Engel et al., ICLR 2020) is a neural audio synthesis framework. Instead of generating raw waveforms directly, a small RNN predicts control parameters for classical DSP building blocks:
| Component | Role |
|---|---|
| Harmonic oscillator bank | Up to 60 sinusoidal partials with time-varying frequency and amplitude |
| Filtered noise | 65 STFT-like noise bands shaping the noisy / breathy / percussive part of the timbre |
| Reverb | Trainable room effect (solo-instrument model) |
The model is trained on short audio chunks (default 4 s at 16 kHz). At inference, it decodes f0, loudness, and spectral envelopes frame-by-frame, then resynthesizes audio through the harmonic + noise processors.
In this project, the solo-instrument architecture is defined in ginn/models/solo_instrument.gin:
RnnFcDecoder.output_splits = (('amps', 1),
('harmonic_distribution', 60),
('noise_magnitudes', 65))
The goal is not only to reproduce training audio, but to extract the control data and use it as a flexible spectral instrument in SuperCollider.
source_audio/*.wav
│
▼ [Docker + DDSP training]
training_out/ ← TensorFlow checkpoints
│
▼ [2-chunks2envelopes.py]
envelopes/*.npz ← per-chunk harmonic + noise control data (+ WAV previews)
│
▼ [optional: enrich_spectrum.py, modify_envelopes.py]
envelopes/*_enriched.npz
│
▼ [3-envelopes2csv.py]
csv_exports/*_unified_*.csv ← canonical format for SuperCollider
│
▼ [SuperCollider GUI app]
real-time sonification, pitch/stretch/bounce, visualization
After training and envelope extraction, the repo produces three layers of data. Understanding these formats is central to using the system.
TensorFlow model checkpoints and gin operative configs (operative_config-*.gin). Used only for re-extracting envelopes; not consumed by SuperCollider directly.
The primary numpy export from a trained checkpoint. Each chunk (4 s of source audio) becomes one .npz file plus companion WAVs.
.npz arrays:
| Key | Shape | Description |
|---|---|---|
frequency_envelopes |
[n_samples, 60] |
Per-harmonic instantaneous frequency (Hz), sample-rate resolution |
amplitude_envelopes |
[n_samples, 60] |
Per-harmonic amplitude |
noise_magnitudes_frames |
[1, n_frames, 65] |
Filtered-noise band energies at model frame rate |
f0_hz_frames |
[n_frames] |
Fundamental frequency per frame |
sample_rate |
scalar | e.g. 16000 |
frame_rate |
scalar | model control rate, e.g. ~250 Hz |
Companion WAV files (per chunk):
| File | Content |
|---|---|
chunk_XX_1oscbank.wav |
Harmonics only |
chunk_XX_1noise.wav |
Noise only |
chunk_XX_1oscbank_noise.wav |
Harmonic + noise mix |
These are useful for auditioning what DDSP learned before moving to SuperCollider.
The canonical interchange format for SuperCollider. Harmonics and noise bands are merged into one file, synchronized by frame_index.
Metadata line (comment):
# frame_rate=250.57,sample_rate=44100Columns:
frame_index,f0_hz,component_type,component_index,frequency,value
0,203.77,harmonic,0,203.77,0.000577
0,203.77,harmonic,1,407.53,0.006631
0,203.77,noise_band,0,123.08,0.001200| Column | Meaning |
|---|---|
frame_index |
Time frame (harmonics and noise aligned) |
f0_hz |
Fundamental at this frame |
component_type |
harmonic or noise_band |
component_index |
Partial index (0–59 harmonics, 0–64 noise bands) |
frequency |
Hz (instantaneous for harmonics; band center for noise) |
value |
Amplitude (harmonics) or magnitude (noise) |
By default, 3-envelopes2csv.py downsamples from sample rate to frame rate to keep files manageable. Use --full-resolution or python/downsample_csv.py to tune size vs. fidelity.
See docs/CSV_FORMAT_GUIDE.md for full specification.
Between .npz and CSV export, several tools reshape the learned spectra:
| Script | Purpose |
|---|---|
python/enrich_spectrum.py |
Extend 8 kHz DDSP bandwidth to full spectrum (44.1 kHz) via harmonic extrapolation and shaped noise |
python/modify_envelopes.py |
Rescale frequency range, resample duration, fix pitch, remap partials |
python/downsample_csv.py |
Shrink CSVs for faster SC experimentation (--frame-step, --max-harmonics, --max-noise-bands) |
python/single_batch_export.py |
Batch-combine multiple chunks into one unified CSV |
Typical enrichment workflow:
cd training
python enrich_spectrum.py \
--input envelopes/chunk_00_envelopes.npz \
--output envelopes/chunk_00_enriched.npz
python 3-envelopes2csv.py --chunk-key envelopes/chunk_00_enrichedThe heart of the performance layer is sc/harmonic_noise_unified_controller_bouncing.scd (recommended). Evaluating this file opens a full graphical application — no need to edit code for everyday use.
- Open SuperCollider.
- Evaluate the entire
.scdfile (Cmd+Enteron the block). - Two windows appear:
- Unified Controller — control panel (sliders, buttons, file import)
- Unified Harmonic + Noise Visualization — live spectral plot
The control panel is organized into sections:
| Section | Controls |
|---|---|
| Data file | Import CSV button — load any unified CSV at runtime |
| Temperament | TET size (12 = semitones, 24 = quarter-tones, 53 = Turkish/Arabic, etc.) |
| Master controls | Master pitch (±48 steps), spectral stretch (0.25×–4×), anchor mode (center partial / fixed f0 / manual Hz) |
| Volume & mix | Harmonic volume, partial gain, noise volume, noise magnitude scale, control smoothing |
| Conductor | Start frame, frame count, update rate — read a window of the CSV on a clock |
| Sonification | Playback rate, Start / Stop |
| Bouncing | Timed spectral-stretch "bounce" with min/max scale, speed, jitter, duration |
| Utilities | Reset all, open visualization, preset stretch factors |
All transforms apply in real time to both harmonics and noise bands in parallel. Pitch uses configurable equal temperament; stretch uses log-frequency scaling around a selectable anchor.
| File | Role |
|---|---|
sc/harmonic_noise_unified_controller.scd |
Earlier version (hardcoded CSV path) |
sc/spectral_mapping.scd |
Map learned partials onto user-defined target spectra |
# After CSV export:
cp training/csv_exports/chunk_00_unified_*.csv examples/
# In SuperCollider: evaluate sc/harmonic_noise_unified_controller_bouncing.scd
# Then click "Import CSV..." and select your fileFull step-by-step details: training/README.md.
# 1. Place .wav files in training/source_audio/
# 2. Start GPU Docker container
cd training
./0-start_docker.sh
# inside container:
./1-start_training.sh
# 3. Extract envelopes (host, with DDSP env)
# activate your DDSP environment
python 2-chunks2envelopes.py
# 4. Export unified CSV
python 3-envelopes2csv.py --all-chunks
# 5. Play in SuperCollider (see above)ddsp2sc/
├── ginn/ # DDSP gin configs (solo_instrument, datasets, eval)
├── training/ # Training workspace (Docker, scripts, outputs)
│ ├── source_audio/ # Your training WAVs
│ ├── prepared/ # TFRecord shards
│ ├── training_out/ # Checkpoints
│ ├── envelopes/ # .npz + preview WAVs
│ └── csv_exports/ # Unified CSVs for SuperCollider
├── python/ # Export, enrichment, modification utilities
├── sc/ # SuperCollider GUI sonification apps
├── docs/ # CSV format & downsampling guides
├── docker/ # Docker image definition
└── examples/ # Sample CSVs for SC (copy exports here)
Training (Docker): TensorFlow + DDSP — provided by the training Docker image (training/0-start_docker.sh).
Export / manipulation (host):
pip install numpy scipy pandas soundfileSonification: SuperCollider 3.12+
docs/CSV_FORMAT_GUIDE.md— unified CSV specificationdocs/DOWNSAMPLE_README.md— reducing CSV size for experimentationtraining/README.md— training directory layout and commands
DDSP:
@inproceedings{engel2020ddsp,
title={DDSP: Differentiable Digital Signal Processing},
author={Jesse Engel and Lamtharn (Hanoi) Hantrakul and Chenjie Gu and Adam Roberts},
booktitle={ICLR},
year={2020}
}