This repo contain the code to evaluate six CLIP-based architectures for the classification of chest X-rays.
Clone the repo, create your environment and install the dependencies using the following commands (you may need to change your pytorch version to fit your system):
#Clone the repo
git clone https://github.com/TheoSourget/clip_cxr_fairness.git
#Create a new python env
conda create --name clip_fairness python=3.10
conda activate clip_fairness
#Install the dependencies
pip install -r requirements.txtor
#Clone the repo
git clone https://github.com/TheoSourget/clip_cxr_fairness.git
#Create the env and install the dependencies
make setup_envYou will need to download pretrained weights before using some models
Download the model weight from their original repo (we used the ResNet50 M,M,C14) and place it in the pretrained/cxrclip folder
- Download the weights from this link rename and place the file at pretrained/chexzero/clip_weights.pt
- Download the weights from this link and place the file in pretrained/chexzero/ViT-B-32.pt
TBA
Download the dataset from this link In the data folder place all the images from the orginal subfolders into a single data/CXR14/imgs folder Place Data_Entry_2017.csv in the data/CXR14 folder
Download NIH-CX14_TubeAnnotations_NonExperts_aggregated.csv from this link and place it into the data/CXR14 folder
You can process all the data generating the original datasets with the command:
python process_datasets.pyThe images will be resized to 224x224 and normalized.
To generate the drains label launch drains_detection.py script after processing the data with the previous command. If you need to train the drains detection model the following command will both train the model and applied it to unannotated data:
python drains_detection.py --train --data_path PATH_TO_CXR14If you already trained the model you can use the command to apply it to the unannotated data:
python drains_detection.py --weights PATH_TO_WEIGHTS --data_path PATH_TO_CXR14The script generate_embeddings.py can be used to generate the embeddings. An example using the following command:
python generate_embeddings.py --model_name medimageinsight --batch_size 32 --dataset MIMICthe options are:
- --model_name: name of the model to apply. Choose between:
- --batch_size: The number of image to process at the same time. For some model the image will still be processed one by one
- --dataset: The dataset to use, either MIMIC or CXR14
The path to need to be specified within generate_embeddings.py
The script evaluate_performance.py compute the probabality for the labels defined within the file and saved them in data/probas_dataset/. It will also compute the AUC and AUCPR and save the results in data/performance/dataset/. Here is an example to launch the script:
python evaluate_performance.py --model_name medimageinsight --batch_size 32The options are: the options are:
- --model_name: name of the model to apply. Choose between:
- --batch_size: The number of image to process at the same time. For some model the image will still be processed one by one
- --dataset: Name of the dataset you want to use, MIMIC or CXR14.
The path to need to be specified within generate_embeddings.py
To reproduce most of the tables and figures from the paper you can launch generate_figures_tables.py:
python generate_figures_tables.pyTo generate the PCA plots and the differences between cendroids you can use embedding_analysis.py. Here is an example:
python embedding_analysis.py --model_name medclip --projection_type PCAIf you used our code for your research, please cite our paper:
@article{sourget2025fairnessclip,
title={Fairness and Robustness of CLIP-Based Models for Chest X-rays},
author={Théo Sourget and David Restrepo and Céline Hudelot and Enzo Ferrante and Stergios Christodoulidis and Maria Vakalopoulou},
journal={arXiv preprint arXiv:2507.21291},
year={2025},
}
This repo contains code from the base repo of the models, we want to thank the authors of these repos:
If you're using any of the model and/or dataset for research, please remember to cite the corresponding original papers following their authors guidelines.