首页> 外文学位 >Three machine learning algorithms for lexical ambiguity resolution.
【24h】

Three machine learning algorithms for lexical ambiguity resolution.

机译:三种机器学习算法,用于词汇歧义度解决。

获取原文
获取原文并翻译 | 示例

摘要

Lexical ambiguity resolution is a pervasive problem in natural language processing. An important example is target-word choice in machine translation, such as deciding whether the English word sentence should be translated into French as peine (legal sentence) or phrase (grammatical sentence) depending upon analysis of surrounding context. The same problem arises in text-to-speech synthesis, where pronunciations such as lead role and lead mine must be resolved through context. Similar problems include capitalization and accent restoration, proper-name classification, and general word-sense disambiguation for many applications.; This dissertation describes three original algorithms for solving this class of problems. The first is a Bayesian discriminator for semantic word classes. It uses statistical models of context to identify the most likely thesaurus category at each position in a document. Sense and translation differences are resolved through these class models. Applications of this work to discourse analysis and language modelling are explored.; The second algorithm is a supervised statistical decision procedure using a variant of decision lists. It offers an efficient mechanism for utilizing diverse, non-independent sources of evidence in a very large parameter space. The dissertation includes empirical studies in language polysemy on which this algorithm and its smoothing procedures are based. The algorithm is evaluated on a wide range of homographs, include ambiguities in text-to-speech synthesis and accent restoration in Spanish and French.; The third algorithm is an essentially unsupervised decision procedure that bootstraps from a small number of seed words automatically extracted from machine-readable dictionaries. The algorithm is driven by the joint exploitation of two empirically studied properties--that words tend to exhibit only one sense in a given collocation and in a given discourse. Accuracy exceeds 96% on diverse test sets. This performance rivals that of previous fully supervised methods while eliminating the need for costly hand-tagged training data, the lack of which has been a severe bottleneck for progress in this area.
机译:词汇歧义解决是自然语言处理中普遍存在的问题。一个重要的例子是机器翻译中的目标词选择,例如,根据对周围环境的分析,确定英语单词句子应被翻译成法语的“ peine(法律句子)”还是“短语(语法句子)”。文本到语音的合成中也出现了同样的问题,其中语音(例如主角角色和主角雷)必须通过上下文来解决。类似的问题包括大写和重音重音,专有名称分类以及在许多应用中普遍的词义歧义。本文描述了解决此类问题的三种原始算法。第一个是语义词类的贝叶斯鉴别器。它使用上下文的统计模型来识别文档中每个位置上最可能的同义词库类别。通过这些类模型可以解决意义和翻译上的差异。探索了这项工作在话语分析和语言建模中的应用。第二种算法是使用决策列表变体的监督统计决策程序。它为在非常大的参数空间中利用各种非独立证据来源提供了一种有效的机制。本文对语言多义性进行了实证研究,并以此为基础对该算法及其平滑过程进行了研究。该算法在多种同形异义词上进行了评估,包括文本到语音合成中的歧义性以及西班牙语和法语中的重音还原。第三种算法是本质上无监督的决策程序,该程序从从机器可读词典中自动提取的少量种子词中进行引导。该算法是通过对两个经验研究的属性的联合开发来驱动的-在给定的搭配和给定的语篇中,单词往往只表现出一种意义。在各种测试装置上,准确性超过96%。这种性能可以与以前的完全监督方法相媲美,同时消除了对昂贵的手工标记训练数据的需求,而缺乏训练数据是该领域取得进展的严重瓶颈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号