Skip to content

Helsinki-NLP/translate-fineweb

Repository files navigation

translate-fineweb

Automatically translated documents from fineweb-edu and nemotron-cc-hq. Translations are based on OPUS-MT and HPLT-MT models.

First edition

The translated data sets below are based on some preliminary runs with slightly erroneous segmentation. Links are availabe from README-2024-06-26.md

Small test sample translations

We tested the approach with a small 10K sample taken from Nemotron-CC. The data is available from a separate list here.

release files for maxidl/nemotron-cc-english-run1

release files for maxidl/nemotron-cc-english-run2

release files for fineweb-edu/350BT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages