PS and KM basic support by dgant · Pull Request #17 · facebookresearch/flores

dgant · 2020-03-13T20:36:13Z

Modified download script to acquire PS/KM/HI/FA
Added prepare.sh for PS/KM parallel and multilingual training

… PS/KM parallel and multilingual training.

dgant · 2020-03-13T20:37:07Z

+cat "${DEVTEST_HI}/dev.hi"          > $DATA/valid.hi-en.hi
+cat "${DEVTEST_HI}/dev.en"          > $DATA/valid.hi-en.en
+cat "${DEVTEST_HI}/test.hi"         > $DATA/test.hi-en.hi
+cat "${DEVTEST_HI}/test.en"         > $DATA/test.hi-en.en


DEVTEST_PSKM and DEVTEST_HI are placeholders for where these datasets will eventually live.

facebook-github-bot · 2020-05-15T21:09:59Z

Hi @dgant!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

guzmanhe

Thanks for this PR.
I've added a few comments. Most importantly, please check the contributor license process.

guzmanhe · 2022-04-14T16:18:59Z

+download_opus_data $KM_ROOT $KM_TGT
+download_opus_data $PS_ROOT $PS_TGT
+download_opus_data $FA_ROOT $FA_TGT
+#download_opus_data $HI_ROOT $HI_TGT


remove commented code

guzmanhe · 2022-04-14T16:20:52Z

+  elif [ "$TGT" = "ps" ]; then
+    URLS=("${PS_OPUS_URLS[@]}")
+    DATASETS=("${PS_OPUS_DATASETS[@]}")
+  elif [ "$TGT" = "fa" ]; then


We don't have Farsi in the original flores. Is there a reason why are you including it in the target?

I believe the intent was to improve the Pashto performance of a multilingual model by training for Farsi as well, being a related language. The Pashto-English parallel corpus was very limited so being able to make use of the Farsi-English corpus would hopefully be a boon. I don't recall whether we tested this or if we did whether it had an impact.

guzmanhe · 2022-04-14T16:24:30Z

@@ -0,0 +1,90 @@
+import argparse


Can you add more information on what is the intended use of this script?
Deduplication is certainly useful, but newcomers might not be so familiar

dgant · 2022-04-14T22:20:29Z

Hi @guzmanhe. I opened this PR while I was working at FAIR with Peng-Jen and Marc'Aurelio. I left FB two years ago so my contributor license agreement status has lapsed :). I won't be making any further changes to this pull request, so if there's not interest on your end to see this through you can close it.

Modified download script to acquire PS/KM/HI/FA. Added prepare.sh for…

62c2fcf

… PS/KM parallel and multilingual training.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 13, 2020

dgant commented Mar 13, 2020

View reviewed changes

Added de-duplication script

feb88e1

ShaidaMuhammad approved these changes Feb 7, 2021

View reviewed changes

putheakhem mentioned this pull request Mar 30, 2021

Broken download link #20

Closed

guzmanhe suggested changes Apr 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS and KM basic support#17

PS and KM basic support#17
dgant wants to merge 2 commits into
facebookresearch:mainfrom
dgant:add-pskm

dgant commented Mar 13, 2020

Uh oh!

dgant Mar 13, 2020

Uh oh!

facebook-github-bot commented May 15, 2020

Uh oh!

guzmanhe left a comment

Uh oh!

guzmanhe Apr 14, 2022

Uh oh!

guzmanhe Apr 14, 2022

Uh oh!

dgant Apr 14, 2022

Uh oh!

guzmanhe Apr 14, 2022

Uh oh!

dgant commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dgant commented Mar 13, 2020

Uh oh!

dgant Mar 13, 2020

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 15, 2020

Uh oh!

guzmanhe left a comment

Choose a reason for hiding this comment

Uh oh!

guzmanhe Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

guzmanhe Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

dgant Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

guzmanhe Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

dgant commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants