Three machine learning algorithms for lexical ambiguity resolution.

机译：三种机器学习算法，用于词汇歧义度解决。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lexical ambiguity resolution is a pervasive problem in natural language processing. An important example is target-word choice in machine translation, such as deciding whether the English word sentence should be translated into French as peine (legal sentence) or phrase (grammatical sentence) depending upon analysis of surrounding context. The same problem arises in text-to-speech synthesis, where pronunciations such as lead role and lead mine must be resolved through context. Similar problems include capitalization and accent restoration, proper-name classification, and general word-sense disambiguation for many applications.; This dissertation describes three original algorithms for solving this class of problems. The first is a Bayesian discriminator for semantic word classes. It uses statistical models of context to identify the most likely thesaurus category at each position in a document. Sense and translation differences are resolved through these class models. Applications of this work to discourse analysis and language modelling are explored.; The second algorithm is a supervised statistical decision procedure using a variant of decision lists. It offers an efficient mechanism for utilizing diverse, non-independent sources of evidence in a very large parameter space. The dissertation includes empirical studies in language polysemy on which this algorithm and its smoothing procedures are based. The algorithm is evaluated on a wide range of homographs, include ambiguities in text-to-speech synthesis and accent restoration in Spanish and French.; The third algorithm is an essentially unsupervised decision procedure that bootstraps from a small number of seed words automatically extracted from machine-readable dictionaries. The algorithm is driven by the joint exploitation of two empirically studied properties--that words tend to exhibit only one sense in a given collocation and in a given discourse. Accuracy exceeds 96% on diverse test sets. This performance rivals that of previous fully supervised methods while eliminating the need for costly hand-tagged training data, the lack of which has been a severe bottleneck for progress in this area.

机译：词汇歧义解决是自然语言处理中普遍存在的问题。一个重要的例子是机器翻译中的目标词选择，例如，根据对周围环境的分析，确定英语单词句子应被翻译成法语的“ peine（法律句子）”还是“短语（语法句子）”。文本到语音的合成中也出现了同样的问题，其中语音（例如主角角色和主角雷）必须通过上下文来解决。类似的问题包括大写和重音重音，专有名称分类以及在许多应用中普遍的词义歧义。本文描述了解决此类问题的三种原始算法。第一个是语义词类的贝叶斯鉴别器。它使用上下文的统计模型来识别文档中每个位置上最可能的同义词库类别。通过这些类模型可以解决意义和翻译上的差异。探索了这项工作在话语分析和语言建模中的应用。第二种算法是使用决策列表变体的监督统计决策程序。它为在非常大的参数空间中利用各种非独立证据来源提供了一种有效的机制。本文对语言多义性进行了实证研究，并以此为基础对该算法及其平滑过程进行了研究。该算法在多种同形异义词上进行了评估，包括文本到语音合成中的歧义性以及西班牙语和法语中的重音还原。第三种算法是本质上无监督的决策程序，该程序从从机器可读词典中自动提取的少量种子词中进行引导。该算法是通过对两个经验研究的属性的联合开发来驱动的-在给定的搭配和给定的语篇中，单词往往只表现出一种意义。在各种测试装置上，准确性超过96％。这种性能可以与以前的完全监督方法相媲美，同时消除了对昂贵的手工标记训练数据的需求，而缺乏训练数据是该领域取得进展的严重瓶颈。

著录项

作者
Yarowsky, David Eric.;
展开▼
作者单位

University of Pennsylvania.;

展开▼
授予单位 University of Pennsylvania.;
学科 Computer Science.; Information Science.; Artificial Intelligence.
学位 Ph.D.
年度 1996
页码 179 p.
总页数 179
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ; 信息与知识传播 ; 人工智能理论 ;
关键词

相似文献

外文文献
中文文献
专利

1. An investigation of working memory influences on lexical ambiguity resolution. [J] . Gadsby N, Arnott WL, Copland DA Neuropsychology . 2008 ,第2期

机译：研究工作记忆对词汇歧义度的影响。
2. The divided visual world paradigm: eye tracking reveals hemispheric asymmetries in lexical ambiguity resolution. [J] . Brain research . 2008 ,第Null期

机译：分裂的视觉世界范式：眼动追踪揭示了词汇歧义解析中的半球不对称性。
3. The divided visual world paradigm: eye tracking reveals hemispheric asymmetries in lexical ambiguity resolution. [J] . Meyer AM, Federmeier KD Brain research . 2008 ,第0期

机译：分裂的视觉世界范式：眼动追踪揭示了词汇歧义解析中的半球不对称性。
4. Automatic classification of unequal lexical stress patterns using machine learning algorithms [C] . Shahin Mostafa Ali, Ahmed Beena, Ballard Kirrie J. 2012 IEEE Workshop on Spoken Language Technology. . 2012

机译：使用机器学习算法自动分类不相等的词汇应力模式
5. Transformation based learning and data-driven lexical disambiguation: Syntactic and semantic ambiguity resolution. [D] . Florian, Radu. 2003

机译：基于转换的学习和数据驱动的词汇歧义消除：句法和语义歧义解析。
6. Using Machine Learning to Predict Early Preparation of Pharmacy Prescriptions at PSMMC - a Comparison of Four Machine Learning Algorithms [O] . Nora Alhorishi, Mohammed Almeziny, Riyad Alshammari 2021

机译：采用机器学习预测PSMMC的药房处方的早期准备 - 四种机器学习算法的比较
7. BetaML: The Beta Machine Learning Toolkit, a self-contained repository of Machine Learning algorithms in Julia [O] . Antonello Lobianco 2021

机译：Betaml：Beta机器学习工具包，Julia的机器学习算法的一个独立的存储库

Three machine learning algorithms for lexical ambiguity resolution.

摘要

著录项

相似文献

相关主题

期刊订阅