Welcome back, Will Lewis!

Submitted by Joyce Parvi on

UW Linguistics welcomes affiliate Prof. William D. Lewis back to campus Spring 2023 to teach LING 575G-I, Language Technologies for Crisis Response, for the computational linguistics program.  Lewis, a co-founder of UW’s Computational Linguistics Masters Program (along with Emily M. Bender and Fei Xia), taught at UW 2005-2007.  Subsequently, Lewis worked at Microsoft on the Translator Team until 2020, eventually becoming Principal PM Architect for Microsoft Translator.

Language Technologies for Crisis Preparedness and Response (LT4CPR) is currently Lewis’s primary research focus.  How Lewis came to be an expert in this topic is an interesting story. During his first eight years at Microsoft, Lewis focused on developing and shipping new languages for Translator. Finding and consuming parallel data for Machine Translation (MT) (parallel content between a target and source language) became increasingly complicated during Lewis’s time at Microsoft, in two very different ways. First, for very high resource languages, often there was too much data to train machine translation models, necessitating filtering down the data to a trainable quantity.  Second, for low resource languages, the predicament was too little training data. In these cases, the team had to turn to increasingly novel sources of training data, such as data provided by governments, language learning materials, Automatic Speech Recognition (ASR) dictionaries between related languages (e.g., between French and Haitian Kreyòl, or even data harvested from publications on low-resource languages (e.g. ODIN).

The need for novel sources of training data came to the fore when Lewis led the Translator team’s efforts to develop translation models for Haitian Kreyòl in the days following the massive Haitian earthquake of 2010, where responding rapidly with NLP and MT technologies was essential for saving lives. Haitian Kreyòl was thus shipped as an official language of Microsoft Translator product in less than 5 days, starting from scratch, with no knowledge of the language, and without any resources. (This is still a record.)

The Haitian Kreyòl project proved to be pivotal in Lewis’s career, and sent him down two different, but related, research paths. First, it stimulated his interest in shipping increasingly lower resource languages, including White Hmong, Queretaro Otomí, Yucatec Maya, Welsh, Canadian French and Inuktitut. Lewis also led Microsoft efforts to develop speech translation models for Levantine Arabic, which helped with the Syrian war refugee crisis. Second, Lewis saw the need for and utility of automated translation in crisis response scenarios. He co-developed the Crisis MT Cookbook, which described a method for creating resources that could facilitate the development of rapid response translation models for any of the world’s languages, a method which was put to the test during the COVID pandemic of 2019 and the Translation Initiative for COVID-19.

Crucially, although the NLP and MT communities have responded to crises in the past, they have mostly done so without collaboration with the crisis response research communities or with relevant aid agencies. LT4CPR seeks to bridge these gaps, and also formulate Crisis Response as a crucial area for language technology research.

Congratulations to Lewis, Xia and other members of the LT4CPR research team, whose project, Language Technologies for Crisis Preparedness and Response (LT4CPR), has recently been funded by NSF.