Skip to content

leosegre/Multi-View-Foundation-Models

Repository files navigation

Multi-View Foundation Models

This repo is the official implementation of Multi-View Foundation Models, by Leo Segre*, Or Hirschorn* and Shai Avidan

Introduction

We introduce a novel framework that transforms existing 2D Foundation Models (like DINO, SAM, and CLIP) into Multi-View Foundation Models. Current 2D models process images independently, leading to inconsistent feature representations for the same 3D point viewed from multiple camera angles.

Setup/Install

We recommend using Anaconda or Miniconda. To set up the environment, follow the instructions below.

Create environment

conda create --name multi_view_foundation_models -y python=3.10
conda activate multi_view_foundation_models
pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install -e .

Demo

To run the demo on a sample scene (Pikachu), use the command below. It will download the Pikachu scene and the pretrained DINOv2_reg model and run a visual correspondence comparison.

python demo.py

Data

Download the Generalization dataset from this link.

For the ScanNet++ dataset - each new user needs to submit application to request access to ScanNet++ from its official website. According to the Terms of Use of ScanNet++, we can only share the preprocessed data with people who have also signed the Terms of Use and been granted access to ScanNet++. After you submit your application and get approved from the ScanNet++ team, you can Forward the approval email to leosegre@mail.tau.ac.il and then we will share our preprocessed data with you directly.

Extact Features

To extract features from images using a specific foundation model, run the following command. For example, for DINOv2:

python test/extract_features.py --exp_name dinov2_reg --colmap_path {path/to/data/root/dir} --exp_directory experiments --scene pikachu --load_pretrained

Training

Run the relevant experiment, for example for DINOv2:

python train/train_dino.py --exp_name {exp_name} --colmap_path {path/to/data/root/dir} --exp_directory {path/to/exp/dir} --config_name dinov2_reg.yaml

If you don't have the camera parameters, use the regular training script with the no-plucker config (the dataloader will automatically use dummy poses):

python train/train_dino.py --exp_name {exp_name} --colmap_path {path/to/data/root/dir} --exp_directory {path/to/exp/dir} --config_name dino_v2_reg_no_plucker.yaml

Testing

python test/test_3d.py --exp_directory {exp_dir} --exp_name {exp_name} --colmap_path {path/to/data/root/dir} --results_dir {path/to/results/dir} --compare_to_base --fit3d

To test on our pretrained models, use the below command (You can change the model type by changing the exp_name to {dinov2_reg, dinov2_reg_no_plucker, dinov3, clip, sam}).

python test/test_3d.py --load_pretrained --exp_directory {exp_dir} --exp_name dinov2_reg --colmap_path {path/to/data/root/dir} --results_dir {path/to/results/dir} --compare_to_base --fit3d

If you don't have the camera parameters, use the standard test script with the no-plucker experiment name:

python test/test_3d.py --load_pretrained --exp_directory {exp_dir} --exp_name dinov2_reg_no_plucker --colmap_path {path/to/data/root/dir} --results_dir {path/to/results/dir} --compare_to_base --fit3d

BibTeX

If you find our models useful, please consider citing our paper!

@article{MultiViewFoundationModels2025,
      title={Multi-View Foundation Models}, 
      author={Leo Segre and Or Hirschorn and Shai Avidan},
      year={2025},
      eprint={2512.15708},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.15708}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors