diff --git a/corpora_and_data_resources.rst b/corpora_and_data_resources.rst index 143f961..919614d 100644 --- a/corpora_and_data_resources.rst +++ b/corpora_and_data_resources.rst @@ -176,6 +176,7 @@ Natural Language Inference (NLI) * `Hebrew Paraphrase Dataset `_ {`CC BY 4.0 `_} - A high-quality paraphrase dataset in Hebrew, consisting of 9785 instances. The dataset includes both paragraph-level (75%) and sentence-level (25%) paraphrases generated with the help of a large language model. Among these, 300 instances have been manually validated as gold standard examples. +* `LCHAIM Dataset `_ {`CC BY 4.0 `_} - A long context, multi-premise NLI dataset, translated and validated from CoNTRoL, consisting of 8,325 pairs in Hebrew. Published at ACL 2025 Paraphrase Detection and Generation ^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/models_tools_services.rst b/models_tools_services.rst index 18a8697..878c2bb 100644 --- a/models_tools_services.rst +++ b/models_tools_services.rst @@ -348,6 +348,8 @@ Fine-Tuned Language Models * `Universal Language Model Fine-tuning for Text Classification (ULMFiT) in Hebrew `_ - The weights (e.g. a trained model) for a Hebrew version for Howard's and Ruder's ULMFiT model. Trained on the Hebrew Wikipedia corpus. +* `LongHero-LCHAIM `_ - a LongHero, fine tuned over HebNLI and then over the LCHAIM long context NLI dataset. + Multilingual Models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^