CLMS News

CLMS Trends Bar Chart
CLMS Applications by cycle from 2011/2012 to 2019/2020

The computational linguistics/natural language processing industry has continued to grow, leading to increasing demand for people trained in this area. The CLMS program, established in 2005 and well known across the US and beyond, has been well-placed to attract large and increasing numbers of highly qualified applicants.

Why UW computational linguistics is so successful

In light of steadily increasing numbers of applicants, in 2019 the program expanded its faculty from three to four with the arrival of Shane Steinert-Threlkeld. 

Emily M. Bender’s 2020 podcasts on Natural Language Processing Ethics [https://soundcloud.com/nlp-highlights/106-ethical-considerations-in-nlp-research-emily-bender], Linguistics in NLP [https://twimlai.com/twiml-talk-376-is-linguistics-missing-from-nlp-research-w-emily-m-bender/], and Linguistics in Artifical Intelligence [https://www.radicalai.org/e16-emily-bender] have a broad audience. Bender alsoorganized an international (online) workshop (May 11-13): "Data Statements for NLP: Towards Best Practices" [https://sites.google.com/uw.edu/data-statements-for-nlp/] together with Prof. Batya Friedman of the iSchool and PhD student Angelina McMillan-Major and sponsored by UW's Tech Policy Lab [ https://techpolicylab.uw.edu/].

Gina Levow has been involved with two projects at the intersection of computational linguistics and language documentation.  (1) LanguageNet Lexicons [http://uakari.ling.washington.edu/languagenet/available/] (supported by the DARPA Low Resource Languages for Emerging Incidents program) is a publicly available, massively multilingual online lexical resource, spanning more than 1800 languages, created to support rapid development of computational tools, such as machine translation or information extraction, for low-resource languages.  (2) ASR24[https://github.com/uiuc-sst/asr24] (also supported by DARPA LORELEI) builds an Automatic Speech Recognition (ASR) system to transcribe speech from a previously unseen language in less than 24 hours.  The system requires only a small corpus of text in the new language and a pronunciation model that can be readily created from a typical Wikipedia page for the language. The system has been successfully applied to Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, Tamil, Zulu, Ilocano, and Odia.