Skip to content

ayk-caglayan/ddsp2sc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ddsp2sc

Train a DDSP (Differentiable Digital Signal Processing) model on your own audio, export its learned spectral controls, and sonify / transform them in SuperCollider through a graphical control panel.

This repo bridges Google's DDSP training stack and a custom SuperCollider performance environment: DDSP learns how a sound is built (harmonics + filtered noise); Python extracts and reshapes that data; SuperCollider plays it back with real-time pitch, stretch, bounce, and mix controls via GUI.


What is DDSP?

DDSP (Engel et al., ICLR 2020) is a neural audio synthesis framework. Instead of generating raw waveforms directly, a small RNN predicts control parameters for classical DSP building blocks:

Component Role
Harmonic oscillator bank Up to 60 sinusoidal partials with time-varying frequency and amplitude
Filtered noise 65 STFT-like noise bands shaping the noisy / breathy / percussive part of the timbre
Reverb Trainable room effect (solo-instrument model)

The model is trained on short audio chunks (default 4 s at 16 kHz). At inference, it decodes f0, loudness, and spectral envelopes frame-by-frame, then resynthesizes audio through the harmonic + noise processors.

In this project, the solo-instrument architecture is defined in ginn/models/solo_instrument.gin:

RnnFcDecoder.output_splits = (('amps', 1),
                              ('harmonic_distribution', 60),
                              ('noise_magnitudes', 65))

The goal is not only to reproduce training audio, but to extract the control data and use it as a flexible spectral instrument in SuperCollider.


Pipeline overview

source_audio/*.wav
       │
       ▼  [Docker + DDSP training]
training_out/          ← TensorFlow checkpoints
       │
       ▼  [2-chunks2envelopes.py]
envelopes/*.npz        ← per-chunk harmonic + noise control data (+ WAV previews)
       │
       ▼  [optional: enrich_spectrum.py, modify_envelopes.py]
envelopes/*_enriched.npz
       │
       ▼  [3-envelopes2csv.py]
csv_exports/*_unified_*.csv   ← canonical format for SuperCollider
       │
       ▼  [SuperCollider GUI app]
real-time sonification, pitch/stretch/bounce, visualization

DDSP training outputs (important)

After training and envelope extraction, the repo produces three layers of data. Understanding these formats is central to using the system.

1. Checkpoints — training/training_out/

TensorFlow model checkpoints and gin operative configs (operative_config-*.gin). Used only for re-extracting envelopes; not consumed by SuperCollider directly.

2. Envelope archives — training/envelopes/*_envelopes.npz

The primary numpy export from a trained checkpoint. Each chunk (4 s of source audio) becomes one .npz file plus companion WAVs.

.npz arrays:

Key Shape Description
frequency_envelopes [n_samples, 60] Per-harmonic instantaneous frequency (Hz), sample-rate resolution
amplitude_envelopes [n_samples, 60] Per-harmonic amplitude
noise_magnitudes_frames [1, n_frames, 65] Filtered-noise band energies at model frame rate
f0_hz_frames [n_frames] Fundamental frequency per frame
sample_rate scalar e.g. 16000
frame_rate scalar model control rate, e.g. ~250 Hz

Companion WAV files (per chunk):

File Content
chunk_XX_1oscbank.wav Harmonics only
chunk_XX_1noise.wav Noise only
chunk_XX_1oscbank_noise.wav Harmonic + noise mix

These are useful for auditioning what DDSP learned before moving to SuperCollider.

3. Unified CSV — training/csv_exports/*_unified_*.csv

The canonical interchange format for SuperCollider. Harmonics and noise bands are merged into one file, synchronized by frame_index.

Metadata line (comment):

# frame_rate=250.57,sample_rate=44100

Columns:

frame_index,f0_hz,component_type,component_index,frequency,value
0,203.77,harmonic,0,203.77,0.000577
0,203.77,harmonic,1,407.53,0.006631
0,203.77,noise_band,0,123.08,0.001200
Column Meaning
frame_index Time frame (harmonics and noise aligned)
f0_hz Fundamental at this frame
component_type harmonic or noise_band
component_index Partial index (0–59 harmonics, 0–64 noise bands)
frequency Hz (instantaneous for harmonics; band center for noise)
value Amplitude (harmonics) or magnitude (noise)

By default, 3-envelopes2csv.py downsamples from sample rate to frame rate to keep files manageable. Use --full-resolution or python/downsample_csv.py to tune size vs. fidelity.

See docs/CSV_FORMAT_GUIDE.md for full specification.


Modifying outputs (Python)

Between .npz and CSV export, several tools reshape the learned spectra:

Script Purpose
python/enrich_spectrum.py Extend 8 kHz DDSP bandwidth to full spectrum (44.1 kHz) via harmonic extrapolation and shaped noise
python/modify_envelopes.py Rescale frequency range, resample duration, fix pitch, remap partials
python/downsample_csv.py Shrink CSVs for faster SC experimentation (--frame-step, --max-harmonics, --max-noise-bands)
python/single_batch_export.py Batch-combine multiple chunks into one unified CSV

Typical enrichment workflow:

cd training
python enrich_spectrum.py \
  --input envelopes/chunk_00_envelopes.npz \
  --output envelopes/chunk_00_enriched.npz

python 3-envelopes2csv.py --chunk-key envelopes/chunk_00_enriched

SuperCollider: GUI sonification app

The heart of the performance layer is sc/harmonic_noise_unified_controller_bouncing.scd (recommended). Evaluating this file opens a full graphical application — no need to edit code for everyday use.

Launch

  1. Open SuperCollider.
  2. Evaluate the entire .scd file (Cmd+Enter on the block).
  3. Two windows appear:
    • Unified Controller — control panel (sliders, buttons, file import)
    • Unified Harmonic + Noise Visualization — live spectral plot

GUI features

The control panel is organized into sections:

Section Controls
Data file Import CSV button — load any unified CSV at runtime
Temperament TET size (12 = semitones, 24 = quarter-tones, 53 = Turkish/Arabic, etc.)
Master controls Master pitch (±48 steps), spectral stretch (0.25×–4×), anchor mode (center partial / fixed f0 / manual Hz)
Volume & mix Harmonic volume, partial gain, noise volume, noise magnitude scale, control smoothing
Conductor Start frame, frame count, update rate — read a window of the CSV on a clock
Sonification Playback rate, Start / Stop
Bouncing Timed spectral-stretch "bounce" with min/max scale, speed, jitter, duration
Utilities Reset all, open visualization, preset stretch factors

All transforms apply in real time to both harmonics and noise bands in parallel. Pitch uses configurable equal temperament; stretch uses log-frequency scaling around a selectable anchor.

Optional modules

File Role
sc/harmonic_noise_unified_controller.scd Earlier version (hardcoded CSV path)
sc/spectral_mapping.scd Map learned partials onto user-defined target spectra

Quick start with SuperCollider

# After CSV export:
cp training/csv_exports/chunk_00_unified_*.csv examples/

# In SuperCollider: evaluate sc/harmonic_noise_unified_controller_bouncing.scd
# Then click "Import CSV..." and select your file

Training workflow

Full step-by-step details: training/README.md.

# 1. Place .wav files in training/source_audio/

# 2. Start GPU Docker container
cd training
./0-start_docker.sh
# inside container:
./1-start_training.sh

# 3. Extract envelopes (host, with DDSP env)
# activate your DDSP environment
python 2-chunks2envelopes.py

# 4. Export unified CSV
python 3-envelopes2csv.py --all-chunks

# 5. Play in SuperCollider (see above)

Repository layout

ddsp2sc/
├── ginn/                  # DDSP gin configs (solo_instrument, datasets, eval)
├── training/              # Training workspace (Docker, scripts, outputs)
│   ├── source_audio/      # Your training WAVs
│   ├── prepared/          # TFRecord shards
│   ├── training_out/      # Checkpoints
│   ├── envelopes/         # .npz + preview WAVs
│   └── csv_exports/       # Unified CSVs for SuperCollider
├── python/                # Export, enrichment, modification utilities
├── sc/                    # SuperCollider GUI sonification apps
├── docs/                  # CSV format & downsampling guides
├── docker/                # Docker image definition
└── examples/              # Sample CSVs for SC (copy exports here)

Requirements

Training (Docker): TensorFlow + DDSP — provided by the training Docker image (training/0-start_docker.sh).

Export / manipulation (host):

pip install numpy scipy pandas soundfile

Sonification: SuperCollider 3.12+


Documentation


Citation

DDSP:

@inproceedings{engel2020ddsp,
  title={DDSP: Differentiable Digital Signal Processing},
  author={Jesse Engel and Lamtharn (Hanoi) Hantrakul and Chenjie Gu and Adam Roberts},
  booktitle={ICLR},
  year={2020}
}

About

DDSP training → envelope export as .csv → SuperCollider GUI sonification/modification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors