首页> 外文学位 >Parse decoration of the word sequence in the speech-to-text machine-translation pipeline.
【24h】

Parse decoration of the word sequence in the speech-to-text machine-translation pipeline.

机译:在语音转文本机器翻译管道中解析单词序列的修饰。

获取原文
获取原文并翻译 | 示例

摘要

Parsing, or the extraction of syntactic structure from text, is appealing to natural language processing (NLP) engineers and researchers. Parsing provides an opportunity to consider information about word sequence and relatedness beyond simple adjacency. This dissertation uses automatically-derived syntactic structure (parse decoration) to improve the performance and evaluation of large-scale NLP systems that have (in general) used only word-sequence level measures to quantify success. In particular, this work focuses on parse structure in the context of large-vocabulary automatic speech recognition (ASR) and statistical machine translation (SMT) in English and (in translation) Mandarin Chinese. The research here explores three characteristics of statistical syntactic parsing: dependency structure, constituent structure, and parse-uncertainty --- making use of the parser's ability to generate an M-best list of parse hypotheses.;Parse structure predictions are applied to ASR to improve word-error rate over a baseline non-syntactic (sequence-only) language model (achieving 6-13% of possible error reduction). Critical to this success is the joint reranking of an N x M-best list of N ASR hypothesis transcripts and M-best parse hypotheses (for each transcript). Jointly reranking the N x M lists is also demonstrated to be useful in choosing a high-quality parse from these transcriptions.;In SMT, this work demonstrates expected dependency pair match (EDPM), a new mechanism for evaluating the quality of SMT translation hypotheses by comparing them to reference translations. EDPM, which makes direct use of parse dependency structure directly in its measurement, is demonstrated to be superior in correlation with human measurements of translation quality to the competitor (and widely-used) evaluation metrics BLEU4 and translation edit rate.;Finally, this work explores how syntactic constituents may predict or improve the behavior of unsupervised word-aligners, a core component of SMT systems, over a collection of Chinese-English parallel text with reference alignment labels. Statistical word-alignment is improved over several machine-generated alignments by exploiting the coherence of certain parse constituent structures to identify source-language regions where a high-recall aligner may be trusted.;These diverse results across ASR and SMT point together to the utility of including parse information into large-scale (and generally word-sequence oriented) NLP systems and demonstrate several approaches for doing so.
机译:分析或从文本中提取语法结构吸引了自然语言处理(NLP)工程师和研究人员。解析提供了一个机会来考虑有关单词序列和相关性的信息,而不仅仅是简单的邻接。本文使用自动派生的句法结构(稀疏修饰)来改善大型NLP系统的性能和评估,该系统通常仅使用词序级度量来量化成功。尤其是,这项工作的重点是英语和汉语普通话的大型词汇自动语音识别(ASR)和统计机器翻译(SMT)的语法结构。本文的研究探索了统计句法分析的三个特征:依赖结构,构成结构和分析不确定性-利用分析器生成M-最佳分析假设列表的能力;将分析结构预测应用于ASR以与基准非句法(仅顺序)语言模型相比,可提高词错误率(可减少6-13%的错误)。取得成功的关键是对N个ASR假设成绩单和M个最佳解析假设(每个成绩单)的N x M最佳列表进行联合重新排序。联合排名N x M列表也被证明可用于从这些转录中选择高质量的解析。在SMT中,这项工作证明了预期依赖对匹配(EDPM),这是一种评估SMT翻译假设质量的新机制。通过将它们与参考翻译进行比较。 EDP​​M直接在其度量中直接使用解析依赖结构,被证明与人工翻译质量度量相比具有竞争者(和广泛使用的)评估指标BLEU4和翻译编辑率优越。探讨了句法成分如何在带有参考对齐标签的中英文平行文本集合上预测或改善无监督单词对齐器(SMT系统的核心组件)的行为。通过利用某些解析构成结构的一致性来确定可信任高调用对齐方式的源语言区域,统计字对齐方式比几种机器生成的对齐方式有所改善; ASR和SMT上的这些不同结果共同指向实用程序包括将信息解析到大规模(通常是面向字序的)NLP系统中,并演示了几种这样做的方法。

著录项

  • 作者

    Kahn, Jeremy G.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Language Linguistics.;Engineering Electronics and Electrical.;Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 128 p.
  • 总页数 128
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:09

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号