AGGREGATION Project

Fei Xia and Emily M. Bender, Co-PIs. AGGREGATION Project: Automatic Generation of Grammars for Endangered Languages from Glosses and Typological Information.

Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.

Read more at the Aggregation Project website.