FoMoH: A Framework for EHR Foundation Model Evaluation

Code repository for FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records.

Requirements

Ensure you have the following installed:

Python 3.10
Conda for environment management
Required Python packages (specified below)

Models for evaluation

EHR Foundation Models

CORE-BEHRT
MOTOR
CEHR-GPT
CEHR-BERT
MAMBA (context clues)
LLAMA (context clues)

Baseline Models

FEMR Baselines (Logistic Regression and LightGBM)
MEDS-TAB (Logistic Regression and XGBoost)

Data Source

We used the OMOP Common Data Model (CDM) derived from electronic health record (EHR) data at a large urban academic medical center, encompassing approximately six million patient records with information on diagnoses, medications, procedures, and laboratory tests. We exported the OMOP domain and vocabulary tables from an on-premises SQL Server as Parquet files, harmonized units for all laboratory measurements, and mapped ICD-9 codes to ICD-10 using General Equivalence Mappings. After data cleaning, we converted the OMOP data to the Medical Event Data Standard (MEDS) format to support models that require MEDS as input. Finally, the dataset was split into training, tuning and held-out sets using the 60:10:30 ratio, and the same patient split was used for both OMOP and MEDS datasets. [Add more description and statistics to this section].

Unit Harmonization

Detailed step-by-step instructions are available in the Unit Harmonization Guide. The hyperlink may not work due to anonymization, please refresh the page after clicking the link or go to the corresponding README manually

ICD9 to ICD10 Mapping

Follow the ICD-9 to ICD-10 mapping instructions in the ICD Mapping Guide. The hyperlink may not work due to anonymization, please refresh the page after clicking the link or go to the corresponding README manually

Convert OMOP to MEDS

export OMOP_DIR=""
export OMOP_MEDS=""
export EVALUATION_DIR=""

conda create -n meds_etl python=3.10
conda activate meds_etl
pip install meds_etl==0.3.9
meds_etl_omop $OMOP_DIR $OMOP_MEDS --num_proc 16

Model Summary and Input Data Formats

Model	Required Data Format	Description
CORE-BEHRT	OMOP	A transformer-based model adapted from BEHRT for longitudinal patient representation using OMOP CDM structure.
MOTOR	MEDS	A Transformer model pretrained using a piecewise exponential time-to-event objective across structured EHR sequences in MEDS format.
CEHR-GPT	OMOP	A generative model trained on OMOP-formatted sequences to synthesize realistic patient trajectories and perform downstream predictions.
CEHR-BERT	OMOP	A masked language model tailored for OMOP-based structured EHR data, used for patient representation and embedding learning.
MAMBA	MEDS	A long-context architecture for EHR modeling based on state-space models, optimized for sequential MEDS-formatted data.
LLAMA	MEDS	A large language model adapted to structured EHR inputs via MEDS, used for zero-shot and generative clinical reasoning.
FEMR Baselines	MEDS	A baseline model uses normalized age and event counts from a patient’s history up to prediction time, trained with Logistic Regression and LightGBM.
MEDS-TAB	MEDS	MEDS-TAB extends baseline features using flexible time windows and aggregation functions (e.g., count, value, min, max), then trains Logistic Regression and XGBoost models.

Evaluation Tasks

Phenotypes

The 11 phenotypes include a diverse set of both acute (e.g., myocardial infarction, ischemic stroke) and chronic conditions (e.g., type 2 diabetes, hypertension, schizophrenia), covering a broad range of clinical domains such as cardiovascular, autoimmune, metabolic, oncologic, and psychiatric diseases. This diversity enables comprehensive evaluation of a model's ability to capture different temporal patterns, disease progression, and diagnostic complexity across varying prediction scenarios.

Phenotype	Description
Celiac Disease	A chronic autoimmune disorder triggered by gluten. Defined by diagnosis codes and risk factors like family history and digestive disorders.
Acute Myocardial Infarction (AMI)	An acute heart condition. Identified via inpatient or ER diagnoses of myocardial infarction and related ischemic heart disease indicators.
Systemic Lupus Erythematosus (SLE)	A chronic autoimmune disease. Requires a combination of symptoms, treatment history, and confirmed SLE diagnosis.
Pancreatic Cancer	A chronic malignancy of the pancreas. Identified using pancreatic neoplasm codes, excluding benign tumors.
Hypertension (HTN)	A chronic condition involving elevated blood pressure. Diagnosed via hypertensive disorder codes and related cardiovascular risk factors.
Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD)	A chronic liver condition formerly known as NAFLD. Defined by liver disease codes and metabolic risk factors such as obesity and diabetes.
Ischemic Stroke	An acute neurological event due to arterial blockage. Requires I63 diagnosis codes during inpatient or ER visits.
Osteoporosis	A chronic bone disease leading to fragility. Case cohort based on diagnosis codes; risk factors include age, gender, body weight, and medications.
Chronic Lymphoid Leukemia (CLL)	A chronic blood cancer. Identified using CLL concept sets, with risk factors including age, symptoms, and family history.
Type 2 Diabetes Mellitus (T2DM)	A chronic metabolic disease. Requires diagnosis plus antidiabetic drug use or elevated HbA1c; risk factors include age, obesity, and prediabetes.
Schizophrenia	A chronic psychiatric disorder. Defined by diagnostic transition from psychosis to schizophrenia in patients aged 10–35 with adequate history.

Patient Outcomes

These three patient outcome tasks—in-hospital mortality, 30-day readmission, and prolonged length-of-stay—are closely tied to hospital operations. They reflect critical quality metrics that impact resource allocation, patient safety, and institutional performance, making them essential for both clinical decision support and healthcare management.

Outcome Cohort	Description
In-hospital mortality	Predicts whether a patient will die during a current hospitalization. Includes admissions lasting >48 hours. Prediction is made 48 hours after admission. Patients must have ≥2 years of prior observation.
30-day readmission	Predicts all-cause readmission within 30 days of discharge. Prediction time is at discharge. Patients must have ≥2 years of prior history and not be censored within 30 days post-discharge. Same-day readmissions are excluded.
Prolonged length-of-stay	Predicts whether a hospitalization lasts more than 7 days. Prediction is made 48 hours after admission. Patients must have ≥2 years of prior observation.

Model Evaluation

The EHR foundation models are pre-trained prior to evaluation, while the baseline models are evaluated directly without pretraining. Following the evaluation protocols established in MOTOR and Contexts Clues, we limit our assessments to linear probing, as further fine-tuning of foundation models can be computationally expensive and time-consuming. For each task, we extract patient representations at the prediction time, train a logistic regression model using 5-fold cross-validation on the extracted features and corresponding labels, and report the AUROC on the held-out test. To ensure consistency, we use a fixed random seed for shuffling samples and fitting the logistic regression model.

The hyperlink may not work due to anonymization, please refresh the page after clicking the link or go to the corresponding README manually

MOTOR: MOTOR Pipeline instructions
CEHR-BERT: CEHR-BERT Pipeline instructions
CEHR-GPT: CEHR-GPT Pipeline instructions
CORE-BEHRT: CORE-BEHRT Pipeline instructions
FEMR Baselines: FEMR Baseline Pipeline instructions
MEDS-TAB Baselines: MEDS-TAB Baseline Pipeline instructions
MAMBA-Transport Baselines: MAMBA-Transport Pipeline instructions
MAMBA / LLAMA: MAMBA / LLAMA Pipeline instructions

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoMoH: A Framework for EHR Foundation Model Evaluation

Table of Contents

Requirements

Models for evaluation

EHR Foundation Models

Baseline Models

Data Source

Unit Harmonization

ICD9 to ICD10 Mapping

Convert OMOP to MEDS

Model Summary and Input Data Formats

Evaluation Tasks

Phenotypes

Patient Outcomes

Model Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FoMoH: A Framework for EHR Foundation Model Evaluation

Table of Contents

Requirements

Models for evaluation

EHR Foundation Models

Baseline Models

Data Source

Unit Harmonization

ICD9 to ICD10 Mapping

Convert OMOP to MEDS

Model Summary and Input Data Formats

Evaluation Tasks

Phenotypes

Patient Outcomes

Model Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages