LCHAIM (Long Context Hebrew with Advanced reasoning Inference Model Benchmark) is the first Hebrew benchmark designed to evaluate Natural Language Inference (NLI) over long contexts requiring complex reasoning skills such as coreference, temporal, logical, and analytical inference.
📍 Official repository for the paper:
LCHAIM - Investigating Long Context Reasoning in Hebrew
Ehud Malul*, Oriel Perets*, Ziv Mor, Yigal Kassel, Elior Sulem
📍 Findings of the Association for Computational Linguistics: ACL 2025
📄 Paper Link
LCHAIM is a Hebrew translation and validation of the ConTRoL dataset, tailored to assess the capabilities of Hebrew language models on NLI tasks involving:
- Long premise passages (multi-paragraph)
- Complex reasoning:
- Coreferential
- Temporal
- Logical
- Analytical
The dataset contains 8,325 Hebrew premise-hypothesis pairs, labeled as:
- Entailment
- Contradiction
- Neutral
We evaluated:
- 🧠 AlephBERT
- 🦸 LongHero
- 🤖 LLMs: GPT-4o, Dicta-LM 2.0, Gemma-9B
Best performance (52% accuracy) was achieved by LongHero fine-tuned on HebNLI and LCHAIM. Human accuracy was ~85%, showing a significant gap in Hebrew NLU.
@inproceedings{malul2025lchaim,
title={LCHAIM-Investigating Long Context Reasoning in Hebrew},
author={Malul, Ehud and Perets, Oriel and Mor, Ziv and Kassel, Yigal and Sulem, Elior},
booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
pages={7928--7939},
year={2025}