Simultaneous interpretation (i.e., translating concurrently with the source language speech) is widely used in many scenarios including multilateral organizations (UN/EU), international summits (APEC/G-20), legal proceedings, and press conferences. However, it is well known to be one of the most challenging tasks for humans due to the simultaneous perception and production in two languages. As a result, there are only a few thousand professional simultaneous interpreters world-wide, and each of them can only sustain for 15-30 minutes in each turn. On the other hand, simultaneous translation (either speech-to-text or speech-to-speech) is also notoriously difficult for machines and has remained one of the holy grails of AI. A key challenge here is the word order difference between the source and target languages. For example, if you simultaneously translate German (an SOV language) to English (an SVO language), you often have to wait for the sentence-final German verb. Therefore, most existing "real-time" translation systems resort to conventional full-sentence translation, causing an undesirable latency of at least one sentence, rendering the audience largely out of sync with the speaker. There have been efforts towards genuine simultaneous translation, but with limited success. Recently, at Baidu Research, we discovered a much simpler and surprisingly effective approach to simultaneous (speech-to-text) translation by designing a "prefix-to-prefix" framework tailed to simultaneity requirements. This is in contrast with the "sequence-to-sequence" framework which assumes the availability of the full input sentence. Our approach results in the first simultaneous translation system that achieves reasonable translation quality with controllable latency. Our technique has been successfully deployed to simultaneously translate Chinese speeches into English subtitles at the 2018 Baidu World Conference, and has been demoed live at NeuIPS 2018 Expo Day. Inspired by the success of this very simple approach, we have extended it to produce more flexible translation strategies. Our work has also generated renewed interest in this long-standing problem in the CL community; for instance, two recent papers from Google proposed interesting improvements based on our ideas. Time permitting, I will also discuss our efforts towards the ultimate goal of simultaneous speech-to-speech translation, and conclude with a list of remaining challenges. See demos, media coverage, and more info at: https://simultrans-demo.github.io/ Bio: Liang Huang is Principal Scientist and Head of Institute of Deep Learning USA (IDL-US) at Baidu Research and Assistant Professor (on leave) at Oregon State University. He received his PhD from the University of Pennsylvania in 2008 and BS from Shanghai Jiao Tong University in 2003. He was previously a research scientist at Google, a research assistant professor at USC/ISI, an assistant professor at CUNY, a part-time research scientist at IBM. His research is in the theoretical aspects of computational linguistics. Many of his efficient algorithms in parsing, translation, and structured prediction have become standards in the field, for which he received a Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, and several best paper nominations (ACL 2007, EMNLP 2008, ACL 2010, and SIGMOD 2018). He is also a computational biologist where he adapts his parsing algorithms to RNA and protein folding. He is an award-winning teacher and a best-selling author. His work has garnered widespread media attention including Fortune, CNBC, IEEE Spectrum, and MIT Technology Review.
展开▼