DCASE2026 Challenge: Task 1 - Baseline System

Contact: Panagiota Anastasopoulou (panagiota.anastasopoulou@upf.edu), Music Technology Group, Universitat Pompeu Fabra, Barcelona

Heterogeneous Audio Classi1cation

This task focuses on heterogeneous audio classification using the Broad Sound Taxonomy (BST), which comprises 5 top-level and 23 second-level sound categories. The goal of this task is to evaluate sound classification models on diverse, real-world audio that varies widely in its nature, including duration and recording conditions. To that end, two complementary Freesound-based datasets are provided: a curated set, BSD10k-v1.2, and a larger, noisier, crowd-sourced collection, BSD35k-CS, which reflects real-world labeling variability. Participants are encouraged to explore audio-based and multimodal approaches to sound classification, as well as to leverage hierarchical relationships between taxonomy categories.

For a detailed description of the challenge and this task visit the DCASE website.

Baseline System

This repository contains the code for the baseline systems of the DCASE 2026 Challenge Task 1. It provides a full pipeline for training and evaluating an audio classification model using precomputed audio and text embeddings.

As a baseline system, we use variations of the HATR model presented at the DCASE Workshop 2025 [1]. We include a multimodal and an audio-only version, both non-hierachical models trained on audio (and text) representation vectors extracted using the 630k-audioset-fusion-best.pt checkpoint of the pretrained LAION-CLAP model.

The model is characterized by:

Multimodality: Supports both audio and text input (embeddings) with separate encoders
Attention-based fusion: Learns to weight modalities dynamically
Residual-based classifier: Stacked residual blocks
Data augmentation: Gaussian noise and random masking

For the evaluation phase, apart from standard accuracy (micro and macro in both levels), we additionally compute hierarchical metrics (accuracy, precision, recall, F1) as part of the challenge's rules and ranking system.

Quick Start

Clone this repository.
Create and activate a conda environment:

conda create -n hac python=3.13
conda activate hac

Install requirements:

pip3 install -r requirements.txt

You can edit the PyTorch version if necessary to suit your system.

Download and extract the datasets: BSD10k-v1.2 and BSD35k-CS. Their file structure is described in their READMEs.
Specify the input and output paths in config.yaml. Make sure all paths point to the correct directories or files before running the model. By default, all generated files for internal model use are stored in the data/ directory. We also assume that datasets are placed or symlinked into this directory.
Run the data preparation script:

python build_dataset.py

Train the model:

python train_test.py

This script includes k-fold training and evaluation of the models on their respective test sets.

You can run only evaluation with evaluate.py.

Summarize the results:

python summarize_results.py

This script summarizes results for each model across all 5 folds.

Note: You can skip steps 6-8 and simply run: python main.py

Baseline Results

Pending (to be added soon)

Citations

[1] Panagiota Anastasopoulou, Jessica Torrey, Xavier Serra, and Frederic Font. Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset. In Proc. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
README.md		README.md
build_dataset.py		build_dataset.py
config.yaml		config.yaml
dataset_utils.py		dataset_utils.py
evaluate.py		evaluate.py
losses.py		losses.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
summarize_results.py		summarize_results.py
train_test.py		train_test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCASE2026 Challenge: Task 1 - Baseline System

Heterogeneous Audio Classi1cation

Baseline System

Quick Start

Baseline Results

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DCASE2026 Challenge: Task 1 - Baseline System

Heterogeneous Audio Classi1cation

Baseline System

Quick Start

Baseline Results

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages