LLM-Heuristics

LLM-based DICOM header analysis and heuristic file generation for heudiconv

LLM-Heuristics is a tool that leverages Large Language Models (LLMs) to automatically analyze DICOM headers and generate heuristic files for heudiconv, streamlining the conversion from DICOM to BIDS format.

How It Works

DICOM Analysis: Scans DICOM directory and extracts metadata from headers
BIDS Mapping: Uses BIDS schema to map scanning sequences automatically using LLM, based on the dicom information using LLM
Heuristic Generation: LLM generates Python heuristics file

All processing happens locally on your machine. No DICOM data, headers, or metadata are ever sent to external services or shared with third parties. The LLM models are downloaded once and run entirely on your local hardware (GPU/CPU).

Quick Start

Installation

# Install from source
git clone https://github.com/NeuroHackademy2025/llm-heuristics.git
cd llm-heuristics
pip install -e .[all]

# Install HeuDiConv (required dependency)
# already installed in `docker://tientong/llm-heuristics:0.1.0` 
pip install heudiconv[all]

Apptainer Installation

# Pull the container
apptainer build llm-heuristics_0-1-0.sif docker://tientong/llm-heuristics:0.1.0

Workflow

The workflow includes main steps for DICOM-to-BIDS conversion:

Analyze your dataset:

Run with multiple cpu (cannot be used with --slurm)

llm-heuristics analyze /path/to/dicom/data /path/to/output --n-cpus 16

Run using slurm jobs (cannot be used with --n-cpus)

# Generate SLURM script for cluster processing
llm-heuristics analyze /large/dicom/dataset /output/dir --slurm

# Submit the generated job array
sbatch /cluster/jobs/heudiconv_extract.slurm

This creates in /path/to/output/:

.heudiconv/ folder with HeuDiConv output files
aggregated_dicominfo.tsv file with aggregated DICOM metadata across all sequences and subjects

Group sequences (pandas groupby operations):
```
llm-heuristics group /path/to/output

# apptainer (no GPU or models needed for grouping)
apptainer run \
 -B /path/to/output:/output \
 llm-heuristics_0-1-0.sif group /output
```
This creates:
- aggregated_dicominfo_groups.tsv - Grouped sequences data with representative examples
- grouping_report.txt - Detailed grouping summary and statistics
Map to BIDS (LLM-based BIDS schema integration):
```
llm-heuristics map-bids /path/to/output

# apptainer
apptainer run --nv \
 -B /path/to/output:/output \
 -B /path/to/models:/home/llmuser/models \
 llm-heuristics_0-1-0.sif map-bids /output
```
This creates:
- aggregated_dicominfo_mapped.tsv - Groups mapped to specific BIDS patterns with confidence scores
- mapping_report.txt - Detailed BIDS mapping summary and validation results

Generate a heuristic file (uses heudiconv's convertall.py structure with mapped data):

llm-heuristics generate /path/to/output -o heuristic.py

# apptainer
apptainer run --nv \
 -B /path/to/output:/output \
 -B /path/to/models:/home/llmuser/models \
 llm-heuristics_0-1-0.sif generate /output -o /output/heuristic.py

Use with heudiconv:

heudiconv -d /path/to/dicom/{subject}/*/*.dcm \
    -o /path/to/bids \
    -f heuristic.py \
    -s 01 \
    -c dcm2niix \
    -b

Generate Heuristic with Custom Context

The --context parameter provides custom guidance to the LLM for generating more appropriate heuristics:

# Example 1: Prefer raw/original data
llm-heuristics generate /output/dir -o heuristic.py \
    --context "Use raw sequences only, exclude any derived or motion-corrected data"

# Example 2: Use derived/processed data when needed
llm-heuristics generate /output/dir -o heuristic.py \
    --context "For functional data, use motion-corrected (derived) sequences instead of raw"

Models

LLM-Heuristics uses Llama 3.1 models for BIDS mapping and heuristic generation. The --model parameter accepts any HuggingFace model name that is compatible with the Llama architecture. All models run locally on your machine.

You can use any Llama 3.1 model from HuggingFace, including:

meta-llama/Meta-Llama-3.1-70B
meta-llama/Meta-Llama-3.1-8B
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct

Model Selection Guidelines

Use Case	Recommended Model	Reason
Production	70B Instruct	Highest accuracy for medical imaging
Development/Testing	8B Instruct	Faster iteration, lower resource usage
Limited Hardware	8B Instruct	Lower memory and VRAM requirements

Usage Examples

# Use 8B model for BIDS mapping
llm-heuristics map-bids /path/to/output \
    --model "meta-llama/Meta-Llama-3.1-8B-Instruct"

# Use higher precision (requires more GPU memory)
llm-heuristics map-bids /path/to/output \
    --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
    --no-quantization

Model Download and Caching

First run: Models are downloaded from HuggingFace and cached locally
Subsequent runs: Uses cached models for faster startup
Access: Requires HuggingFace account with appropriate model access
Quantization: Enabled by default to reduce memory usage

What is Quantization? (Click to expand)

Quantization is a technique used to reduce the memory footprint and computational requirements of LLMs. It works by reducing the precision of model weights from high-precision formats to lower-precision formats.

Types of Quantization:

Precision	Memory Usage	Quality	Speed
FP32 (32-bit)	100%	Best	Slowest
FP16 (16-bit)	50%	Very Good	Fast
INT8 (8-bit)	25%	Good	Faster
INT4 (4-bit)	12.5%	Acceptable	Fastest

In This Project:

Default: 4-bit quantization enabled (use_quantization=True)
Memory Savings: ~75% reduction in model size
Quality Trade-off: Small quality reduction for significant memory savings
Configurable: Use --no-quantization flag for higher precision

Performance Tips

For development: Use 8B model with quantization for faster iteration
For production: Use 70B model for best accuracy
Memory optimization: Keep quantization enabled unless you have excess GPU memory

Configuration

Create a configuration file or use environment variables:

# Environment variables
export LLM_MODEL_NAME="meta-llama/Meta-Llama-3.1-70B-Instruct"  # Default model
export LLM_USE_QUANTIZATION="true"
export LOG_LEVEL="INFO"

# Or create config.json
{
  "model": {
    "name": "meta-llama/Meta-Llama-3.1-70B-Instruct",  # Default model
    "use_quantization": true,
    "temperature": 0.7
  },
  "heuristic": {
    "include_session": true,
    "exclude_motion_corrected": true
  }
}

Model Setup

Download Models Locally

Quick Setup:

# Set your HuggingFace token
export HF_TOKEN="your_huggingface_token"
export HF_HOME="/path/to/hf/home"

pip install huggingface_hub
huggingface-cli login

cd $HF_HOME
hf download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir ./meta-llama/Meta-Llama-3.1-8B-Instruct
hf download meta-llama/Meta-Llama-3.1-70B-Instruct --local-dir ./meta-llama/Meta-Llama-3.1-70B-Instruct


# if you get error
export HF_HUB_READ_TIMEOUT=120
export HF_HUB_CONNECTION_TIMEOUT=30

hf download meta-llama/Meta-Llama-3.1-70B-Instruct \
  --local-dir ./meta-llama/Meta-Llama-3.1-70B-Instruct \
  --max-workers 4

Getting HuggingFace Access

Create HuggingFace account: https://huggingface.co/join
Request Llama access: https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct
Create access token: https://huggingface.co/settings/tokens
Wait for approval: Meta will review your request

Contributing

We welcome contributions! Please see CONTRIBUTING.md for development setup and contribution guidelines.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Note: This software uses Large Language Models that are subject to separate license terms. The Llama models require separate licensing from Meta. See the LICENSE file for complete details on third-party components.

Acknowledgments

HeuDiConv team for the DICOM-BIDS conversion tool
Meta for the Llama models
Hugging Face for the Transformers library and model hosting

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
examples		examples
llm_heuristics		llm_heuristics
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
heudiconv_extract.slurm		heudiconv_extract.slurm
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Heuristics

How It Works

Table of Contents

Quick Start

Installation

Apptainer Installation

Workflow

Generate Heuristic with Custom Context

Models

Model Selection Guidelines

Usage Examples

Model Download and Caching

Types of Quantization:

Performance Tips

Configuration

Model Setup

Download Models Locally

Getting HuggingFace Access

Contributing

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-Heuristics

How It Works

Table of Contents

Quick Start

Installation

Apptainer Installation

Workflow

Generate Heuristic with Custom Context

Models

Model Selection Guidelines

Usage Examples

Model Download and Caching

Types of Quantization:

Performance Tips

Configuration

Model Setup

Download Models Locally

Getting HuggingFace Access

Contributing

License

Acknowledgments

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages