首页> 外文期刊>Information Processing & Management >Annotation and verification of sense pools in OntoNotes
【24h】

Annotation and verification of sense pools in OntoNotes

机译:OntoNotes中感官池的注释和验证

获取原文
获取原文并翻译 | 示例
           

摘要

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.
机译:本文介绍了OntoNotes,它是一个具有大规模语义注释的多语言(英语,中文和阿拉伯语)语料库,包括谓词-自变量结构,词义,本体链接和共指。 OntoNotes的基础语义模型涉及被分组为所谓意义池的词义,即一组近似同义词的词义。此类信息对许多应用程序很有用,包括信息检索(IR)系统的查询扩展,文本摘要系统的(近)重复检测以及写作支持系统的备选单词选择。尽管一个感官池提供了一组几乎同义的单词感官,但是对于一个池中的两个单词在实际使用中是否可以互换尚不了解。因此,本文设计了一种无监督算法,该算法结合了Google n元语法和统计检验,以确定一个池中的一个单词是否可以被同一池中的其他单词替代。 n-gram特征用于度量替换的上下文不匹配程度。然后,根据不匹配程度,应用统计测试来确定替换是否足够。将该方法与监督方法(线性判别分析(LDA))进行了比较。实验结果表明,提出的无监督方法可以达到与监督方法相当的性能。

著录项

  • 来源
    《Information Processing & Management》 |2010年第4期|P.436-447|共12页
  • 作者单位

    Department of Information Management, Yuan-Ze University, No. 135. Yuan-Tung Road, Chung-Li 32030, Taiwan, ROC;

    rnDepartment of Computer Science and Information Engineering, National Cheng Kung University, No.1. Ta-Hsueh Road, Tainan, Taiwan, ROC;

    rnDepartment of Computer Science and Information Engineering, National Cheng Kung University, No.1. Ta-Hsueh Road, Tainan, Taiwan, ROC;

    rnDepartment of Computer Science and Information Engineering, National Cheng Kung University, No.1. Ta-Hsueh Road, Tainan, Taiwan, ROC;

    rnInformation Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292, United States;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    lexical semantics; corpus annotation; lexical substitution; ontology linking;

    机译:词汇语义;语料注释;词汇替换本体链接;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号