Skip to content
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
This repository was archived by the owner on Nov 21, 2023. It is now read-only.

Topics #39

@jaspock

Description

@jaspock

According to the FLORES-101 paper, "we manually labeled all sentences by a more detailed sub-topic, one of 10 possibilities: crime, disasters, entertainment, geography, health, nature, politics, science, sports, and travel". Table 1 in the paper includes the statistics of these different sub-topics. However, in the metadata files there is a much larger number of sub-topics (actually, 306) such as:

Accident
accidents
accordion/right hand
advanced interactive media
Alchohol
American education/forgotten half/Foster care
American education/Special Needs ADD
...
ancient china/government
Ancient Civilizations/Romans
Ancient_Civilizations/Assyrians
...
big cats
big cats, lion
big cats, ocelot
big cats, tiger
Blended Learning/Blogging
Blended Learning/Field trips
Bugs/Insects_Intro
business
castles of england/tudor castles
castles of english/development of castles
climate
...

Is the 10-class metadata available for download or some recomendations on how to group the existing ones into a smaller number of topics?

The list of the 306 topics may be asily obtained with:

cat metedata_dev*|cut -f 3|sort| uniq

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions