About • Installation • How To Use • Credits • License
This repository was used to conduct experiments for AVSS task
Follow these steps to install the project:
-
(Optional) Create and activate new environment using
conda.# create env conda create -n project_env python=3.9 # activate env conda activate project_env
-
Install all required packages
pip install -r requirements.txt
-
Download video encoder and compute video features
python3 src/utils/download_video_encoder.py python3 -m src.utils.create_video_features path/to/dataset/folder path/to/video_encoder/checkpoint.pth -
Download best model checkpoint
mkdir models/ python3 src/utils/download_pretrained.py "models/avlit_best" "https://drive.google.com/uc?export=download&id=101-FcQSouUsiVRYeRmDnqjIUfwfKgY_B" ""
To train a the model, run the following command:
python3 train.py -cn=CONFIG_NAME HYDRA_CONFIG_ARGUMENTSWhere CONFIG_NAME is a config from src/configs (can be one of {avlit, rtfsnet_train, csfnet, tfgridnet, iianet}) and HYDRA_CONFIG_ARGUMENTS are optional arguments.
To run inference, process audio mix and save results to ./{save_path} (default is data/saved/inferenced, can be changed via hydra override inferencer.save_path=new_save_path)
python3 inference.py -cn=inference.yaml +datasets.val.data_dir=path/to/dataset/folderThe predictions for the first speaker will be in folder ./{save_path}/s1 and for the second speaker in ./{save_path}/s2
To calculate metrics on the given separated audios:
python3 -m src.utils.calculate_metrics path/to/dataset/audio/folder /path/to/results/of/inferenceTo calculate time/memory metrics on the given separated audios:
python3 time_memory_metrics.py datasets=eval model=MODEL_CONFIGWhere MODEL_CONFIG is one of {avlit, iianet, rtfs_net}
If using rtfs_net also add an argument model.num_rtfs_blocks=15
This repository is based on a PyTorch Project Template.