A Model Context Protocol (MCP) server for fetching sequencing data metadata from public genomics databases using ffq and running kallisto quantification.
1. Clone and setup environment:
git clone https://github.com/gmtrash/ffq_mcp.git
cd ffq_mcp
conda env create -f environment.yml
conda activate ffq-mcp2. Get the paths:
which python
# Copy this output (e.g., /home/user/miniconda3/envs/ffq-mcp/bin/python)
pwd
# Copy this output (e.g., /home/user/ffq_mcp)3. Add to Claude Code:
# General syntax:
claude mcp add --transport stdio --scope user ffq -- \
<PYTHON_PATH> \
<REPO_PATH>/src/ffq_server.py
# Example (replace with YOUR actual paths):
claude mcp add --transport stdio --scope user ffq -- \
/home/aubreybailey/miniforge3/envs/ffq-mcp/bin/python \
/home/aubreybailey/code/ffq_mcp/src/ffq_server.pyImportant flags:
--transport stdio- Required: Specifies stdio transport for MCP--scope user- Makes it available in all your projects--- Required separator between flags and command/args- Use
--scope projectif you only want it in this specific project
4. Verify:
claude mcp list # Should show "ffq"5. Test it: In any Claude Code session, ask: "What tools do you have from the ffq server?"
You should see 10 tools including fetch_metadata, summarize_experimental_design, and run_kallisto_quantification.
1. Clone and install:
git clone https://github.com/gmtrash/ffq_mcp.git
cd ffq_mcp
conda env create -f environment.yml
conda activate ffq-mcp2. Find your Python path:
which python
# Copy this output (e.g., /Users/you/miniconda3/envs/ffq-mcp/bin/python)3. Get your repo path:
pwd
# Copy this output (e.g., /Users/you/ffq_mcp)4. Configure Claude Desktop:
Open config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Add this (replace paths with your actual paths from steps 2 and 3):
{
"mcpServers": {
"ffq": {
"command": "/Users/you/miniconda3/envs/ffq-mcp/bin/python",
"args": ["/Users/you/ffq_mcp/src/ffq_server.py"]
}
}
}5. Restart Claude Desktop
6. Test it: Ask Claude: "What tools do you have from the ffq server?"
You should see 10 tools including fetch_metadata, summarize_experimental_design, and run_kallisto_quantification.
"error: unknown option '--command'"
- You're using old syntax. Use
--transport stdioand--separator instead. - Correct:
claude mcp add --transport stdio --scope user ffq -- <python> <script>
Server shows "disconnected" in /mcp
- Check your Python path is correct:
which python(with conda env activated) - Make sure the script path points to
src/ffq_server.py - Try removing and re-adding:
claude mcp remove ffqthen add again
"ffq not found" or "kallisto not found"
- Make sure conda environment is created:
conda env create -f environment.yml - The MCP uses the full Python path, so conda doesn't need to be activated when running Claude Code
- Verify installation: Run
<your-python-path> -m pip list | grep ffq
Need to update after git pull?
- No need to re-add the MCP server
- Just restart Claude Code or use
/mcpto reconnect - The server automatically uses the latest code from your repo
This MCP server enables Claude and other MCP clients to:
-
Fetch metadata and download links for sequencing data from:
- GEO (Gene Expression Omnibus)
- SRA (Sequence Read Archive)
- EMBL-EBI (European Molecular Biology Laboratory)
- DDBJ (DNA Data Bank of Japan)
- NIH Biosample
- ENCODE (Encyclopedia of DNA Elements)
-
Run kallisto quantification to create analysis-ready transcript-level quantitation matrices from RNA-seq data
The server provides 10 main tools:
Fetch comprehensive metadata for sequencing data using accession numbers or DOIs.
Supported accession formats:
- SRA:
SRR,SRX,SRS,SRP - GEO:
GSE,GSM - ENCODE accessions
- Bioproject codes
- Biosample IDs
- DOIs
Example:
{
"accession": "GSE123456",
"level": 2
}Retrieve direct download URLs for sequencing data files from various providers.
Supported providers:
ftp- FTP serversaws- Amazon Web Servicesgcp- Google Cloud Platformncbi- NCBI servers
Example:
{
"accession": "SRR123456",
"provider": "aws"
}Fetch metadata for multiple accessions simultaneously.
Example:
{
"accessions": ["GSM123456", "GSM123457", "GSM123458"],
"level": 1
}Extract all SRA and GSM accession numbers from a study or publication.
Example:
{
"accession": "GSE123456"
}Output:
{
"query": "GSE123456",
"sra_accessions": ["SRR123456", "SRR123457"],
"gsm_accessions": ["GSM123456", "GSM123457"],
"gse_accessions": ["GSE123456"],
"all_accessions": ["GSE123456", "GSM123456", "GSM123457", "SRR123456", "SRR123457"]
}Extract sequencing data accession numbers from a publication URL or text.
This tool automatically scans publications and extracts:
- SRA: SRR, SRX, SRS, SRP (NCBI Sequence Read Archive)
- GEO: GSE, GSM, GPL (Gene Expression Omnibus)
- ENA: ERR, ERX, ERS, ERP (European Nucleotide Archive)
- BioProject: PRJNA, PRJEB, PRJDB
Works with:
- PubMed Central articles
- Journal websites (Nature, Science, Cell, etc.)
- Preprint servers (bioRxiv, medRxiv)
- DOI links
- Plain text
Example (from URL):
{
"source": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1234567/",
"source_type": "url"
}Example (from text):
{
"source": "The RNA-seq data are available at GEO under accession GSE123456. Individual samples are SRR123456, SRR123457, and SRR123458.",
"source_type": "text"
}Output:
{
"source": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1234567/",
"source_type": "url",
"accessions_found": ["GSE123456", "SRR123456", "SRR123457", "SRR123458"],
"count": 4,
"by_database": {
"sra": ["SRR123456", "SRR123457", "SRR123458"],
"geo": ["GSE123456"],
"ena": [],
"bioproject": []
},
"by_type": {
"SRR": ["SRR123456", "SRR123457", "SRR123458"],
"GSE": ["GSE123456"]
}
}Check if all required tools and dependencies are installed for the FFQ MCP server.
This tool validates:
- Command-line tools: ffq, kallisto, wget, AWS CLI
- Python packages: mcp, h5py, scipy, numpy
- Provides OS-specific installation instructions for missing dependencies
Example:
{
"verbose": true
}Output:
{
"overall_status": "ready",
"operating_system": {
"system": "Linux",
"release": "5.15.0",
"machine": "x86_64"
},
"command_line_tools": {
"ffq": {"installed": true, "version": "0.3.1"},
"kallisto": {"installed": true, "version": "0.50.1"},
"wget": {"installed": true, "version": "1.21.3"},
"aws": {"installed": true, "version": "2.13.0"}
},
"python_packages": {
"mcp": {"installed": true, "version": "1.0.0"},
"h5py": {"installed": true, "version": "3.9.0"},
"scipy": {"installed": true, "version": "1.11.0"},
"numpy": {"installed": true, "version": "1.24.3"}
},
"missing": []
}If dependencies are missing, the response includes detailed installation instructions tailored to your operating system.
Download reference transcriptome FASTA and GTF annotation files from Ensembl or GENCODE.
Supports:
- Organisms: Human (GRCh38), Mouse (GRCm39/GRCm38)
- Sources: Ensembl (comprehensive), GENCODE (high-quality human/mouse annotations)
- Automatic latest release detection
- Optional GTF annotation download
Example:
{
"organism": "human",
"source": "ensembl",
"output_dir": "./references",
"include_gtf": true
}Output:
{
"status": "success",
"organism": "human",
"source": "ensembl",
"release": "110",
"output_directory": "./references",
"files_downloaded": {
"fasta": "./references/Homo_sapiens.GRCh38.cdna.all.fa.gz",
"gtf": "./references/Homo_sapiens.GRCh38.110.gtf.gz"
}
}Build a kallisto index from a reference transcriptome FASTA file.
The index is required for quantification and needs to be built once per reference. Index building typically takes 5-10 minutes for human/mouse transcriptomes.
Example:
{
"fasta_path": "./references/Homo_sapiens.GRCh38.cdna.all.fa.gz",
"index_path": "./references/Homo_sapiens.GRCh38.idx",
"kmer_size": 31
}Parameters:
fasta_path(required): Path to transcriptome FASTA file (gzipped or uncompressed)index_path(optional): Output path for index (auto-generated if not specified)kmer_size(optional): K-mer size for index (default: 31)- Use 31 for reads ≥50bp (standard RNA-seq)
- Use 15-21 for short reads <50bp
Output:
{
"status": "success",
"index_path": "./references/Homo_sapiens.GRCh38.idx",
"fasta_path": "./references/Homo_sapiens.GRCh38.cdna.all.fa.gz",
"kmer_size": 31,
"index_size_mb": 3842.5,
"kallisto_version": "kallisto 0.50.1"
}Run kallisto quantification on RNA-seq data and create transcript-level quantitation matrices.
This tool automates the complete workflow:
- Fetches FASTQ files using ffq metadata
- Downloads files from cloud storage (AWS S3, GCP, FTP)
- Runs kallisto quant on each sample
- Aggregates results into TPM and counts matrices
Example:
{
"accessions": ["SRR123456", "SRR123457", "SRR123458"],
"index_path": "/path/to/transcriptome.idx",
"output_dir": "/path/to/output",
"threads": 8,
"provider": "aws"
}Parameters:
accessions(required): List of SRA run accessionsindex_path(required): Path to kallisto index fileoutput_dir(optional): Output directory (temp dir if not specified)threads(optional): Number of threads (default: 4)bootstrap_samples(optional): Bootstrap samples for uncertainty (default: 0)provider(optional): Download provider -aws,gcp,ftp, orncbi(default:aws)output_format(optional): Output format -tsv,hdf5, or10x(default:tsv)tsv: Tab-separated text fileshdf5: HDF5 binary format (compatible with HDF5Array in R/Bioconductor)10x: 10x Genomics sparse matrix format (matrix.mtx, features.tsv, barcodes.tsv)
Output:
{
"status": "success",
"summary": {
"samples_processed": 3,
"num_transcripts": 178123,
"num_samples": 3
},
"output_directory": "/path/to/output",
"matrices": {
"tpm_matrix": "/path/to/output/tpm_matrix.tsv",
"counts_matrix": "/path/to/output/counts_matrix.tsv"
},
"samples": ["SRR123456", "SRR123457", "SRR123458"]
}Safety Features:
- Detects UMI (Unique Molecular Identifier) usage and blocks quantification if UMIs are detected
- UMI protocols (10x Genomics, Drop-seq, etc.) require deduplication before quantification
- Provides clear error messages suggesting appropriate preprocessing tools (UMI-tools, zUMIs)
Parse experimental design from study metadata to identify sample groupings.
This tool extracts sample characteristics and groups samples by experimental conditions, making it easy for the LLM to understand the experimental structure.
Example:
{
"accession": "GSE123456"
}Output:
{
"design_summary": "6 samples with treatment: 3 control vs 3 dexamethasone",
"total_samples": 6,
"variables": {
"treatment": {
"control": ["GSM001", "GSM002", "GSM003"],
"dexamethasone": ["GSM004", "GSM005", "GSM006"]
},
"replicate": {
"1": ["GSM001", "GSM004"],
"2": ["GSM002", "GSM005"],
"3": ["GSM003", "GSM006"]
}
}
}Detects:
- Treatment/control groups
- Genotype differences
- Tissue types, cell types
- Time series experiments
- Replicates
- Other experimental factors (age, sex, disease, etc.)
New users: Run the interactive setup wizard:
python setup.pyThis wizard will:
- Check your system for installed dependencies
- Recommend the best installation method for your OS
- Provide step-by-step installation instructions
- Validate your setup
Existing users: Use the check_environment tool within Claude to verify your setup:
"Check my environment for the FFQ MCP server"
You can install and run this server in three ways:
Using conda ensures all dependencies are correctly installed:
# Clone the repository
git clone https://github.com/gmtrash/ffq_mcp.git
cd ffq_mcp
# Create conda environment
conda env create -f environment.yml
# Activate environment
conda activate ffq-mcp
# Test that it works
python src/ffq_server.py
# Press Ctrl+C to exitUse Docker for isolated, reproducible deployments:
# Build the Docker image
docker build -t ffq-mcp .
# Run the container
docker run -i ffq-mcp
# Or with volume mounts for data
docker run -i -v /path/to/data:/data -v /path/to/index:/index ffq-mcpFor Claude Desktop with Docker:
{
"mcpServers": {
"ffq": {
"command": "docker",
"args": ["run", "-i", "--rm", "ffq-mcp"]
}
}
}Singularity is ideal for HPC clusters that don't support Docker:
# Build Singularity image
sudo singularity build ffq_mcp.sif ffq_mcp.def
# Run the container
singularity run ffq_mcp.sifFor Claude Desktop with Singularity:
{
"mcpServers": {
"ffq": {
"command": "singularity",
"args": ["run", "/path/to/ffq_mcp.sif"]
}
}
}If you prefer manual installation:
Prerequisites:
- Python 3.10 or higher
- ffq (for metadata fetching)
- kallisto (for quantification)
- wget or AWS CLI (for downloading FASTQ files)
cd ffq_mcp
pip install -r requirements.txt
pip install ffq
conda install -c bioconda kallisto # or install from sourceYou can prepare a kallisto index in two ways:
Option 1: Using the MCP tools (Recommended)
Ask Claude to download and index the reference:
"Download the human reference transcriptome from Ensembl and build a kallisto index"
Claude will call:
fetch_reference_transcriptometo download the FASTA filebuild_kallisto_indexto create the index
Option 2: Manual preparation
# Download a reference transcriptome (e.g., human from Ensembl)
wget http://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
# Build kallisto index
kallisto index -i Homo_sapiens.GRCh38.idx Homo_sapiens.GRCh38.cdna.all.fa.gzAdd the following to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Using Conda (Recommended):
{
"mcpServers": {
"ffq": {
"command": "/path/to/conda/envs/ffq-mcp/bin/python",
"args": [
"/absolute/path/to/ffq_mcp/src/ffq_server.py"
]
}
}
}Using System Python:
{
"mcpServers": {
"ffq": {
"command": "python3",
"args": [
"/absolute/path/to/ffq_mcp/src/ffq_server.py"
]
}
}
}Finding your conda path:
conda activate ffq-mcp
which python # Use this full path in the configUse the server via stdio transport:
python /path/to/mcp-ffq-server/src/ffq_server.pyUser: "What's the experimental design for GSE63525?"
Claude: Uses summarize_experimental_design and responds:
"This study has 6 samples with treatment groups:
- Control: 3 samples (GSM001, GSM002, GSM003)
- Dexamethasone: 3 samples (GSM004, GSM005, GSM006)
This is a classic 3v3 treatment vs control design with 3 biological replicates per condition."
💡 Why this is useful: Claude can now automatically understand experimental structure without you having to manually explain it. Perfect for downstream analysis planning!
User: "What are the SRA run accessions for GSE123456?"
Claude (using the MCP server):
I'll extract the SRA accessions from GSE123456.
[Calls extract_sra_gsm_numbers with accession: GSE123456]
The study GSE123456 contains the following SRA runs:
- SRR123456
- SRR123457
- SRR123458
User: "Get me the AWS download links for SRR123456"
Claude:
[Calls fetch_download_links with accession: SRR123456, provider: aws]
Here are the AWS S3 download links for SRR123456:
- s3://sra-pub-run-odp/sra/SRR123456/SRR123456_1.fastq.gz
- s3://sra-pub-run-odp/sra/SRR123456/SRR123456_2.fastq.gz
User: "What sequencing data is associated with DOI 10.1038/example.2024?"
Claude:
[Calls fetch_metadata with accession: 10.1038/example.2024]
This publication includes sequencing data from:
- Study: SRP123456
- Samples: 12 biological replicates
- Runs: 24 paired-end RNA-seq runs
- Total size: 156.3 GB
User: "Analyze the RNA-seq data from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1234567/"
Claude:
[Step 1: Extract accessions from paper]
Calls extract_accessions_from_publication
Found accessions:
- GSE123456 (GEO Series)
- SRR123456, SRR123457, SRR123458 (SRA Runs)
[Step 2: Get detailed metadata]
Calls extract_sra_gsm_numbers with GSE123456
[Step 3: Downloads reference and builds index]
[Step 4: Runs quantification]
Analysis complete! Created expression matrices from the published data.
User: "Download the human reference from Ensembl, build an index, and quantify GSE123456"
Claude:
[Step 1: Downloads reference]
Calls fetch_reference_transcriptome with organism: human, source: ensembl
Downloaded:
- Homo_sapiens.GRCh38.cdna.all.fa.gz (170 MB)
- Homo_sapiens.GRCh38.110.gtf.gz (52 MB)
[Step 2: Builds index]
Calls build_kallisto_index with fasta_path: Homo_sapiens.GRCh38.cdna.all.fa.gz
Created kallisto index: Homo_sapiens.GRCh38.idx (3.8 GB)
[Step 3: Extracts SRA accessions]
Calls extract_sra_gsm_numbers with accession: GSE123456
Found 6 SRA runs: SRR123456-SRR123461
[Step 4: Runs quantification]
Calls run_kallisto_quantification with:
accessions: ["SRR123456", "SRR123457", "SRR123458", "SRR123459", "SRR123460", "SRR123461"],
index_path: "Homo_sapiens.GRCh38.idx",
threads: 8,
output_format: "hdf5"
Processing complete! Created quantitation matrices:
- TPM matrix: 6 samples x 178,123 transcripts
- Counts matrix: 6 samples x 178,123 transcripts
Output saved in HDF5 format:
- tpm_matrix.h5
- counts_matrix.h5
TSV format (default):
tpm_matrix.tsv
counts_matrix.tsv
HDF5 format (for R/Bioconductor HDF5Array):
tpm_matrix.h5
counts_matrix.h5
Each HDF5 file contains:
matrix: Expression valuesfeatures: Transcript IDsbarcodes: Sample IDs
10x Genomics format (for Seurat, Scanpy):
tpm_10x/
├── matrix.mtx
├── features.tsv
└── barcodes.tsv
counts_10x/
├── matrix.mtx
├── features.tsv
└── barcodes.tsv
library(HDF5Array)
# Read TPM matrix
tpm_h5 <- HDF5Array("tpm_matrix.h5", name = "matrix")
features <- as.character(h5read("tpm_matrix.h5", "features"))
samples <- as.character(h5read("tpm_matrix.h5", "barcodes"))
# Convert to SummarizedExperiment
library(SummarizedExperiment)
se <- SummarizedExperiment(
assays = list(tpm = tpm_h5),
rowData = DataFrame(transcript_id = features),
colData = DataFrame(sample_id = samples)
)library(Seurat)
# Read counts matrix
counts <- Read10X(data.dir = "counts_10x")
seurat_obj <- CreateSeuratObject(counts = counts)import scanpy as sc
# Read counts matrix
adata = sc.read_10x_mtx("counts_10x")import h5py
import numpy as np
# Read TPM matrix
with h5py.File("tpm_matrix.h5", "r") as f:
tpm = f["matrix"][:]
features = f["features"][:].astype(str)
samples = f["barcodes"][:].astype(str)mcp-ffq-server/
├── src/
│ └── ffq_server.py # Main MCP server implementation
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
└── README.md # This file
pip install -e ".[dev]"
pytestTo add new tools to the server:
- Define the tool in
list_tools()with appropriate input schema - Implement the handler in
call_tool() - Update this README with usage examples
The level parameter controls how much metadata to fetch:
- Level 0: Only the requested accession
- Level 1: Direct children (e.g., GSE → GSM)
- Level 2: Grandchildren (e.g., GSE → GSM → SRX)
- Level 3: Great-grandchildren (e.g., GSE → GSM → SRX → SRR)
- Level 4: All downstream accessions (complete hierarchy)
Default: Fetches all levels (equivalent to level 4)
The server provides detailed error messages for:
- Invalid accession numbers
- Network failures when accessing databases
- Malformed responses from ffq
- Missing dependencies
This MCP server is provided as-is for use with ffq. Please refer to the ffq repository for licensing information about the underlying tool.
Contributions are welcome! Please ensure:
- Code follows Python best practices
- New tools include input validation
- README is updated with examples
- Error handling is comprehensive
Ensure ffq is installed and in your PATH:
which ffq
pip install ffqCheck Python version:
python --version # Should be 3.10+Install dependencies:
pip install -r requirements.txt- Verify the accession number is correct
- Check your internet connection
- Try a different metadata level
- Check if the database is accessible (GEO/SRA may have downtime)
Ensure kallisto is installed and in your PATH:
which kallisto
conda install -c bioconda kallisto # or install from source- For AWS S3 downloads: Ensure AWS CLI is installed (
aws --version) - For GCP downloads: Ensure gsutil is installed
- For FTP/HTTP downloads: Ensure wget is installed
- Try a different provider (e.g., switch from
awstoftp) - Check your internet connection and firewall settings
- Verify the index path is correct and accessible
- Build an index from a reference transcriptome:
kallisto index -i transcriptome.idx transcriptome.fa.gz