Official implementation for 3DV'25 paper: InterTrack: Tracking Human Object Interaction without Object Templates Project Page | ProciGen-video Dataset | ArXiv
- Template-free single frame reconstruction: HDM.
- Template based: CHORE, VisTracker.
- Demo code.
- Human registration as standard alone repo, see fast-human-reg.
- Training.
- Full BEHAVE evaluation.
- March 21, 2025, code released, hello world!
The code is tested on torch=2.4.1+cu121, cuda12.1, debian11. In general it should work with torch + cuda 12.1.
We recommend using anaconda environment:
conda create -n intertrack python=3.10
conda activate intertrack Required packages can be installed by:
pip install -r pre-requirements.txt # Install pytorch and others
pip install -r requirements.txt # Install pytorch3d from sourceSMPL body models: We use SMPL-H (mano_v1.2) from this website.
Download and unzip to a local path and modify in SMPL_MODEL_ROOT in lib_smpl/const.py.
To use the smplfitter, we also need a kid_template.npy file, see this doc.
To run our code, login to the AGORA project path, download the smpl_kid_template.npy via SMIL/SMIL-X template->SMIL(SMPL formate), and rename it to kid_template.npy. In the end, the file structure should be this:
SMPL_MODEL_ROOT
|--kid_template.npy
|--SMPLH_FEMALE.pkl
|--SMPLH_MALE.pklpython download_models.pyWe prepare two example sequences for quick start, one is captured by mobile phone and the other is from BEHAVE dataset.
Download the packed file from Edmond and then do unzip InterTrack-demo-data.zip -d demo-data .
Once downloaded, update the values of these paths: (use absolute path)
- The path to
demo-dataindataset.demo_data_path, i.e. this line. - The path to packed data in
dataset.behave_packed_dir, i.e. this line, modify this todemo-data/packed. - The path to SMPL assets, i.e. set
SMPL_ASSETS_ROOTtodemo-data/assetsinlib_smpl/const.py.
# Run InterTrack on mobile phone sequence
bash scripts/demo_phone.sh
# Run InterTrack on one behave sequence
bash scripts/demo_behave.shAfter running InterTrack on the behave sequence, you can evaluate the results with:
python eval/eval_separate.py -pr outputs/corrAE/single/opt-hoi-orighdm/pred -gt outputs/stage2/single/demo-stage2/gt -split configs/splits/demo-table.jsonYou should see some numbers like this: All 679 images: hum_F-score@0.01m=0.3983 obj_F-score@0.01m=0.6754 H+O_F-score@0.01m=0.5647 CD=0.0257
To run test on more BEHAVE sequences, you will need to download this packed file and update behave_packed_dir in configs/structured.py file. And prepare similar split files as configs/splits/demo-seq-table-15fps.pkl (for HDM recon. and optimization), and configs/splits/demo-seq-table-15fps-video.pkl (for object pose prediction).
Coming soon...
Some notes regarding reproducing the results in paper:
- GT translation was used to run HDM (first stage in InterTrack). This is the same protocol as prior works PC2 and HDM. Because single view reconstruction has the inherent depth-scale ambiguity, with the unordered point cloud as output, it is difficult to align it with GT points to compute meaningful numbers. So we simply follow previous works to use GT translation, removing at least the depth ambiguity and focus on shape evaluation. If your work does not use GT translation, there are two possible ways to setup InterTrack for a potentially more fair comparison: Run InterTrack with estimated 2d translation, following this script, and then you can do either 1). align with GT using the SMPL vertices, or 2). use pure ICP.
- Pose estimation: we provide two pretrained checkpoints for object pose (step 4), see download models, the
TOPNet-5obj.pthis for BEHAVE and InterCap chairs, tables and monitor, all other objects should use the other checkpointTOPNet-small-objs.pth.
Run test on the full behave dataset: coming soon...
If you use the code, please cite:
@inproceedings{xie2024InterTrack,
title = {InterTrack: Tracking Human Object Interaction without Object Templates},
author = {Xie, Xianghui and Lenssen, Jan Eric and Pons-Moll, Gerard},
booktitle = {International Conference on 3D Vision (3DV)},
month = {March},
year = {2025},
}
@inproceedings{xie2023template_free,
title = {Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation},
author = {Xie, Xianghui and Bhatnagar, Bharat Lal and Lenssen, Jan Eric and Pons-Moll, Gerard},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
}
This project leverages the following excellent works, we thank the authors for open-sourcing their code:
- The PyTorch3D library.
- The diffusers library.
- The pc2 project.
- The smplfitter library from NLF.
Please see LICENSE.
