Skip to content

xiweideng/MvKeTR

Repository files navigation

MvKeTR

[IJBHI 2025] This is the official implementation of MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement accepted to IEEE Journal of Biomedical and Health Informatics (J-BHI), 2025.

Abstract

CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced TansfoRmer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) as the fundamental building blocks of both modules, which exhibit superior parameter efficiency and reduced spectral bias to better capture high-frequency components critical for CT interpretation while mitigating overfitting. Extensive experiments on the public CTRG-Chest-548 K dataset demonstrate that our method outpaces prior state-of-the-art (SOTA) models across almost all metrics.

MvKeTR Overview

Requirements

  • torch==2.0.1
  • torchvision==0.15.2
  • numpy==1.22.4

Model

You can download the model checkpoint from here.

Dataset

The CTRG-Chest-548K dataset used in this work can be downloaded from here. Moreover, you need to download clip_report_embeddings.npz from here, the extracted report embeddings of the CTRG-Chest-548K dataset via CT-RATE pretrained CT-CLIP. Please place the extracted files and clip_report_embeddings.npz in the data folder. The directory structure should look like:

data/
├── CTRG-Chest-548K/
└── clip_report_embeddings.npz

Train

Run bash train_ctrg_chest.sh to train a model on the CTRG-Chest-548K dataset.

Test

Run bash test_ctrg_chest.sh to test a model on the CTRG-Chest-548K dataset.

Acknowledgements

This work builds upon the excellent work of CT2Rep, R2GenCMN, efficient-kan, and CT-CLIP.

Citation

If you use or extend our work, please cite our paper.

@article{deng2025mvketr,
  title={MvKeTR: Chest CT Report Generation With Multi-View Perception and Knowledge Enhancement},
  author={Deng, Xiwei and He, Xianchun and Bao, Jianfeng and Zhou, Yudan and Cai, Shuhui and Cai, Congbo and Chen, Zhong},
  journal={IEEE Journal of Biomedical and Health Informatics},  
  year={2025},
  publisher={IEEE}
}

Contact

If you have any questions, please feel free to contact xiweideng@stu.xmu.edu.cn.

License

This repository is under Apache License 2.0.

About

This is the official implementation of MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement accepted to IEEE Journal of Biomedical and Health Informatics (J-BHI), 2025.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors