LLM-based DICOM header analysis and heuristic file generation for heudiconv
LLM-Heuristics is a tool that leverages Large Language Models (LLMs) to automatically analyze DICOM headers and generate heuristic files for heudiconv, streamlining the conversion from DICOM to BIDS format.
- DICOM Analysis: Scans DICOM directory and extracts metadata from headers
- BIDS Mapping: Uses BIDS schema to map scanning sequences automatically using LLM, based on the dicom information using LLM
- Heuristic Generation: LLM generates Python heuristics file
All processing happens locally on your machine. No DICOM data, headers, or metadata are ever sent to external services or shared with third parties. The LLM models are downloaded once and run entirely on your local hardware (GPU/CPU).
- How It Works
- Quick Start
- Workflow
- Models
- Configuration
- Model Setup
- Contributing
- License
- Acknowledgments
- Support
# Install from source
git clone https://github.com/NeuroHackademy2025/llm-heuristics.git
cd llm-heuristics
pip install -e .[all]
# Install HeuDiConv (required dependency)
# already installed in `docker://tientong/llm-heuristics:0.1.0`
pip install heudiconv[all]
# Pull the container
apptainer build llm-heuristics_0-1-0.sif docker://tientong/llm-heuristics:0.1.0
The workflow includes main steps for DICOM-to-BIDS conversion:
- Analyze your dataset:
Run with multiple cpu (cannot be used with --slurm)
llm-heuristics analyze /path/to/dicom/data /path/to/output --n-cpus 16Run using slurm jobs (cannot be used with --n-cpus)
# Generate SLURM script for cluster processing
llm-heuristics analyze /large/dicom/dataset /output/dir --slurm
# Submit the generated job array
sbatch /cluster/jobs/heudiconv_extract.slurmThis creates in /path/to/output/:
.heudiconv/folder with HeuDiConv output filesaggregated_dicominfo.tsvfile with aggregated DICOM metadata across all sequences and subjects
-
Group sequences (pandas groupby operations):
llm-heuristics group /path/to/output # apptainer (no GPU or models needed for grouping) apptainer run \ -B /path/to/output:/output \ llm-heuristics_0-1-0.sif group /outputThis creates:
aggregated_dicominfo_groups.tsv- Grouped sequences data with representative examplesgrouping_report.txt- Detailed grouping summary and statistics
-
Map to BIDS (LLM-based BIDS schema integration):
llm-heuristics map-bids /path/to/output # apptainer apptainer run --nv \ -B /path/to/output:/output \ -B /path/to/models:/home/llmuser/models \ llm-heuristics_0-1-0.sif map-bids /outputThis creates:
aggregated_dicominfo_mapped.tsv- Groups mapped to specific BIDS patterns with confidence scoresmapping_report.txt- Detailed BIDS mapping summary and validation results
-
Generate a heuristic file (uses heudiconv's convertall.py structure with mapped data):
llm-heuristics generate /path/to/output -o heuristic.py # apptainer apptainer run --nv \ -B /path/to/output:/output \ -B /path/to/models:/home/llmuser/models \ llm-heuristics_0-1-0.sif generate /output -o /output/heuristic.py -
Use with heudiconv:
heudiconv -d /path/to/dicom/{subject}/*/*.dcm \ -o /path/to/bids \ -f heuristic.py \ -s 01 \ -c dcm2niix \ -b
The --context parameter provides custom guidance to the LLM for generating more appropriate heuristics:
# Example 1: Prefer raw/original data
llm-heuristics generate /output/dir -o heuristic.py \
--context "Use raw sequences only, exclude any derived or motion-corrected data"
# Example 2: Use derived/processed data when needed
llm-heuristics generate /output/dir -o heuristic.py \
--context "For functional data, use motion-corrected (derived) sequences instead of raw"LLM-Heuristics uses Llama 3.1 models for BIDS mapping and heuristic generation. The --model parameter accepts any HuggingFace model name that is compatible with the Llama architecture. All models run locally on your machine.
You can use any Llama 3.1 model from HuggingFace, including:
meta-llama/Meta-Llama-3.1-70Bmeta-llama/Meta-Llama-3.1-8Bmeta-llama/Meta-Llama-3.1-70B-Instructmeta-llama/Meta-Llama-3.1-8B-Instruct
| Use Case | Recommended Model | Reason |
|---|---|---|
| Production | 70B Instruct | Highest accuracy for medical imaging |
| Development/Testing | 8B Instruct | Faster iteration, lower resource usage |
| Limited Hardware | 8B Instruct | Lower memory and VRAM requirements |
# Use 8B model for BIDS mapping
llm-heuristics map-bids /path/to/output \
--model "meta-llama/Meta-Llama-3.1-8B-Instruct"
# Use higher precision (requires more GPU memory)
llm-heuristics map-bids /path/to/output \
--model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
--no-quantization- First run: Models are downloaded from HuggingFace and cached locally
- Subsequent runs: Uses cached models for faster startup
- Access: Requires HuggingFace account with appropriate model access
- Quantization: Enabled by default to reduce memory usage
What is Quantization? (Click to expand)
Quantization is a technique used to reduce the memory footprint and computational requirements of LLMs. It works by reducing the precision of model weights from high-precision formats to lower-precision formats.
| Precision | Memory Usage | Quality | Speed |
|---|---|---|---|
| FP32 (32-bit) | 100% | Best | Slowest |
| FP16 (16-bit) | 50% | Very Good | Fast |
| INT8 (8-bit) | 25% | Good | Faster |
| INT4 (4-bit) | 12.5% | Acceptable | Fastest |
In This Project:
- Default: 4-bit quantization enabled (
use_quantization=True) - Memory Savings: ~75% reduction in model size
- Quality Trade-off: Small quality reduction for significant memory savings
- Configurable: Use
--no-quantizationflag for higher precision
- For development: Use 8B model with quantization for faster iteration
- For production: Use 70B model for best accuracy
- Memory optimization: Keep quantization enabled unless you have excess GPU memory
Create a configuration file or use environment variables:
# Environment variables
export LLM_MODEL_NAME="meta-llama/Meta-Llama-3.1-70B-Instruct" # Default model
export LLM_USE_QUANTIZATION="true"
export LOG_LEVEL="INFO"
# Or create config.json
{
"model": {
"name": "meta-llama/Meta-Llama-3.1-70B-Instruct", # Default model
"use_quantization": true,
"temperature": 0.7
},
"heuristic": {
"include_session": true,
"exclude_motion_corrected": true
}
}Quick Setup:
# Set your HuggingFace token
export HF_TOKEN="your_huggingface_token"
export HF_HOME="/path/to/hf/home"
pip install huggingface_hub
huggingface-cli login
cd $HF_HOME
hf download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir ./meta-llama/Meta-Llama-3.1-8B-Instruct
hf download meta-llama/Meta-Llama-3.1-70B-Instruct --local-dir ./meta-llama/Meta-Llama-3.1-70B-Instruct
# if you get error
export HF_HUB_READ_TIMEOUT=120
export HF_HUB_CONNECTION_TIMEOUT=30
hf download meta-llama/Meta-Llama-3.1-70B-Instruct \
--local-dir ./meta-llama/Meta-Llama-3.1-70B-Instruct \
--max-workers 4- Create HuggingFace account: https://huggingface.co/join
- Request Llama access: https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct
- Create access token: https://huggingface.co/settings/tokens
- Wait for approval: Meta will review your request
We welcome contributions! Please see CONTRIBUTING.md for development setup and contribution guidelines.
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Note: This software uses Large Language Models that are subject to separate license terms. The Llama models require separate licensing from Meta. See the LICENSE file for complete details on third-party components.
- HeuDiConv team for the DICOM-BIDS conversion tool
- Meta for the Llama models
- Hugging Face for the Transformers library and model hosting
- Issues: GitHub Issues
- Discussions: GitHub Discussions