[Ampln] AmericasNLP 2024 Shared Task : Creation of Educational Materials for Indigenous Languages

Alejandro Molina Villegas amolina en centrogeo.edu.mx
Mar Ene 30 11:01:01 CST 2024


#### AmericasNLP 2024 Shared Task : Creation of Educational Materials for
Indigenous Languages ####

First Call for Participation

The AmericasNLP 2024 shared task on the creation of educational materials
for Indigenous languages is a competition aimed at encouraging the
development of natural language processing systems (NLP) to help with the
teaching and diffusion of Indigenous languages of the Americas.
Participants will build systems that can automatically create exercises by
converting a base sentence into another sentence that’s changed with
regards to one specific property (such as negation or tense). Systems
submitted to the shared task will be presented at the Fourth Workshop on
NLP for Indigenous Languages of the Americas (AmericasNLP) in June 2024,
which will be co-located with the Annual Meeting of the North American
Chapter of the Association for Computational Linguistics (NAACL 2024) and
held in Mexico City.

Why?

Many of the Indigenous languages of the Americas are vulnerable or
endangered. This means that, depending on the language, no or only a few
children are learning them and, generally, they are only spoken by a few
small groups of people. Because of this, these languages are at a high risk
of becoming extinct in the near future. Many communities are carrying out
revitalization efforts, including teaching their languages to their
community members. Creating materials to teach these languages is an urgent
priority, but this process is expensive and time consuming. NLP presents an
opportunity to help with these efforts.

In addition to being endangered, most Indigenous languages of the Americas
are so-called low-resource languages: the data needed to train any NLP
systems, let alone deep learning-based systems, is severely limited. This
means that many approaches used for high-resource languages, such as
English and Chinese, are not directly applicable or perform poorly.
Finally, many Indigenous languages exhibit linguistic properties uncommon
among languages frequently studied in NLP. This constitutes an additional
difficulty. The goal of AmericasNLP is to motivate researchers to take on
the challenge of developing systems for these Indigenous languages.

How?

AmericasNLP invites the submission of results obtained by systems built for
the creation of educational materials for Indigenous languages.
Participants can use the training and development data we provide and there
are no limits on what additional resources participants may use. If
participants want to leverage additional data to improve their systems,
that's great! If they want to use pretrained models, that's great, too! The
only limitation is that we ask participants to not create the test outputs
manually or train on the development or test sets.

In this shared task, participants will be given a dataset with base
sentences. The dataset will also contain an indication of the change we
expect systems to make to each base sentence. Systems will transform the
base sentence into a target sentence according to the indicated change.

Base sentence: Ye' shka' (Bribri for "I walked")

Expected change: Polarity: Negative

Target sentence: Ye' kë̀ shkàne̠ (Bribri for “I didn't walk")

The main metric of the shared task is accuracy. Participants can enter the
competition for as many languages as they like, and systems for every
language will be evaluated separately, in addition to the overall average
score, which will be used to determine the shared task’s winner. We provide
an evaluation script and a baseline system to help participants get started
quickly. If you are interested in this shared task, please register
here: Google
form <https://forms.gle/RQztkDM7ddziM6eP7>

Which languages?

The following languages are featured in the AmericasNLP 2024 shared task on
the creation of educational materials for Indigenous languages (AmericasNLP
2024 Shared Task 2):

   -

   Bribri from Costa Rica
   -

   Guarani from Paraguay
   -

   Maya from Mexico

All data and baseline systems will be made available in this GitHub
repository
<https://github.com/AmericasNLP/americasnlp2024/tree/master/ST2_EducationalMaterials>
.

Important Dates:

   -

   Release of pilot data: January 29, 2024
   -

   Release of training and development sets: February 5, 2024
   -

   Release of baseline systems and baseline results: February 12, 2024
   -

   Release of test inputs: April 1, 2024
   -

   Submission of results (shared task deadline): April 10, 2024
   -

   Announcement of winners: April 12, 2024
   -

   Submission of system descriptions papers: April 19, 2024
   -

   Notification of acceptance: April 22, 2024
   -

   Camera-ready papers due: April 26, 2024

All deadlines are 11:59 pm UTC -12h (AoE).

Organizers

Manuel Mager, Pavel Denisov, Silvia Fernandez Sabido, Samuel Canul Yah,
Alejandro Molina-Villegas, Lorena Hau Ucán, Arturo Oncevay, Rolando
Coto-Solano, Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, Katharina
von der Wense

Contact: americas.nlp.workshop en gmail.com
Website: https://turing.iimas.unam.mx/americasnlp/2024_st.html

--
Alejandro Molina Villegas
Investigador CONAHCYT - Centro de Investigación en Ciencias de Información
Geoespacial
Parque Científico y Tecnológico de Yucatán, Mérida
Tel. 01 (999) 688 53 00 Ext. 1005
www.centrogeo.org.mx/amolina
<https://www.centrogeo.org.mx/areas-profile/amolina#investigacion>
------------ próxima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: <http://ccc.inaoep.mx/pipermail/ampln/attachments/20240130/a8dff4ec/attachment-0001.htm>


Más información sobre la lista de distribución Ampln