Skip to content

govzman/dla-avss

Repository files navigation

Auduio-Visual Source Separation (AVSS) with PyTorch

AboutInstallationHow To UseCreditsLicense

About

This repository was used to conduct experiments for AVSS task

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda.

    # create env
    conda create -n project_env python=3.9
    
    # activate env
    conda activate project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Download video encoder and compute video features

    python3 src/utils/download_video_encoder.py
    python3 -m src.utils.create_video_features path/to/dataset/folder path/to/video_encoder/checkpoint.pth
    
  4. Download best model checkpoint

    mkdir models/
    python3 src/utils/download_pretrained.py "models/avlit_best" "https://drive.google.com/uc?export=download&id=101-FcQSouUsiVRYeRmDnqjIUfwfKgY_B" ""
    

How To Use

To train a the model, run the following command:

python3 train.py -cn=CONFIG_NAME HYDRA_CONFIG_ARGUMENTS

Where CONFIG_NAME is a config from src/configs (can be one of {avlit, rtfsnet_train, csfnet, tfgridnet, iianet}) and HYDRA_CONFIG_ARGUMENTS are optional arguments.

To run inference, process audio mix and save results to ./{save_path} (default is data/saved/inferenced, can be changed via hydra override inferencer.save_path=new_save_path)

python3 inference.py -cn=inference.yaml +datasets.val.data_dir=path/to/dataset/folder

The predictions for the first speaker will be in folder ./{save_path}/s1 and for the second speaker in ./{save_path}/s2

To calculate metrics on the given separated audios:

python3 -m src.utils.calculate_metrics path/to/dataset/audio/folder /path/to/results/of/inference

To calculate time/memory metrics on the given separated audios:

python3 time_memory_metrics.py datasets=eval model=MODEL_CONFIG

Where MODEL_CONFIG is one of {avlit, iianet, rtfs_net}

If using rtfs_net also add an argument model.num_rtfs_blocks=15

Credits

This repository is based on a PyTorch Project Template.

License

License

About

Project (Audio-Visual Source Separation)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors