Skip to content

TheoSourget/clip_cxr_fairness

Repository files navigation

Fairness and Robustness of CLIP-Based Models for Chest X-rays

This repo contain the code to evaluate six CLIP-based architectures for the classification of chest X-rays.

How to install

Clone the repo, create your environment and install the dependencies using the following commands (you may need to change your pytorch version to fit your system):

#Clone the repo
git clone https://github.com/TheoSourget/clip_cxr_fairness.git

#Create a new python env
conda create --name clip_fairness python=3.10
conda activate clip_fairness

#Install the dependencies
pip install -r requirements.txt

or

#Clone the repo
git clone https://github.com/TheoSourget/clip_cxr_fairness.git

#Create the env and install the dependencies
make setup_env

Models:

You will need to download pretrained weights before using some models

CXR-CLIP

Download the model weight from their original repo (we used the ResNet50 M,M,C14) and place it in the pretrained/cxrclip folder

CheXzero

  1. Download the weights from this link rename and place the file at pretrained/chexzero/clip_weights.pt
  2. Download the weights from this link and place the file in pretrained/chexzero/ViT-B-32.pt

Data:

MIMIC-CXR

TBA

NIH-CXR14

Download the dataset from this link In the data folder place all the images from the orginal subfolders into a single data/CXR14/imgs folder Place Data_Entry_2017.csv in the data/CXR14 folder

NEATX

Download NIH-CX14_TubeAnnotations_NonExperts_aggregated.csv from this link and place it into the data/CXR14 folder

Process the datasets

You can process all the data generating the original datasets with the command:

python process_datasets.py

The images will be resized to 224x224 and normalized.

To generate the drains label launch drains_detection.py script after processing the data with the previous command. If you need to train the drains detection model the following command will both train the model and applied it to unannotated data:

python drains_detection.py --train --data_path PATH_TO_CXR14

If you already trained the model you can use the command to apply it to the unannotated data:

python drains_detection.py --weights PATH_TO_WEIGHTS  --data_path PATH_TO_CXR14

How to use

Get the embeddings

The script generate_embeddings.py can be used to generate the embeddings. An example using the following command:

python generate_embeddings.py --model_name medimageinsight --batch_size 32 --dataset MIMIC

the options are:

  • --model_name: name of the model to apply. Choose between:
  • --batch_size: The number of image to process at the same time. For some model the image will still be processed one by one
  • --dataset: The dataset to use, either MIMIC or CXR14

The path to need to be specified within generate_embeddings.py

Get the probabilities

The script evaluate_performance.py compute the probabality for the labels defined within the file and saved them in data/probas_dataset/. It will also compute the AUC and AUCPR and save the results in data/performance/dataset/. Here is an example to launch the script:

python evaluate_performance.py --model_name medimageinsight --batch_size 32

The options are: the options are:

  • --model_name: name of the model to apply. Choose between:
  • --batch_size: The number of image to process at the same time. For some model the image will still be processed one by one
  • --dataset: Name of the dataset you want to use, MIMIC or CXR14.

The path to need to be specified within generate_embeddings.py

Generate the tables and visualisations

To reproduce most of the tables and figures from the paper you can launch generate_figures_tables.py:

python generate_figures_tables.py

To generate the PCA plots and the differences between cendroids you can use embedding_analysis.py. Here is an example:

python embedding_analysis.py --model_name medclip --projection_type PCA

Citation

If you used our code for your research, please cite our paper:

@article{sourget2025fairnessclip,
    title={Fairness and Robustness of CLIP-Based Models for Chest X-rays}, 
    author={Théo Sourget and David Restrepo and Céline Hudelot and Enzo Ferrante and Stergios Christodoulidis and Maria Vakalopoulou},
    journal={arXiv preprint arXiv:2507.21291},
    year={2025},
}

Acknowledgement

This repo contains code from the base repo of the models, we want to thank the authors of these repos:

If you're using any of the model and/or dataset for research, please remember to cite the corresponding original papers following their authors guidelines.

About

Repo aggregating multiple CLIP-based models for chest x-ray classification evaluating their fairness and robustness

Topics

Resources

License

Stars

Watchers

Forks

Contributors