首页> 外文会议>International conference on language resources and evaluation >Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution
【24h】

Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution

机译:Turk Bootstrap Word Sense Inventory 2.0:用于词汇替换的大规模资源

获取原文

摘要

This paper presents the Turk Bootstrap Word Sense Inventory (TWSI) 2.0. This lexical resource, created by a crowdsourcing process using Amazon Mechanical Turk , encompasses a sense inventory for lexical substitution for 1,012 highly frequent English common nouns. Along with each sense, a large number of sense-annotated occurrences in context are given, as well as a weighted list of substitutions. Sense distinctions are not motivated by lexicographic considerations, but driven by substi-tutability: two usages belong to the same sense if their substitutions overlap considerably. After laying out the need for such a resource, the data is characterized in terms of organization and quantity. Then, we briefly describe how this data was used to create a system for lexical substitutions. Training a supervised lexical substitution system on a smaller version of the resource resulted in well over 90% acceptability for lexical substitutions provided by the system. Thus, this resource can be used to set up reliable, enabling technologies for semantic natural language processing (NLP), some of which we discuss briefly.
机译:本文介绍了Turk Bootstrap单词感知量表(TWSI)2.0。该词汇资源是由使用Amazon Mechanical Turk的众包流程创建的,涵盖了一个词汇表,用于代替1,012个频繁出现的英语通用名词。除了每种意义外,还给出了上下文中出现的大量意义注释事件,以及加权替换列表。意义上的区别不是由词典编纂的考虑所驱动,而是由可替换性驱动的:如果两种用法的替代词有相当多的重叠,则它们属于相同的意义。在确定了对此类资源的需求之后,根据组织和数量对数据进行特征化。然后,我们简要描述如何使用此数据创建词汇替换系统。在资源的较小版本上训练有监督的词汇替换系统,导致该系统提供的词汇替换的接受度远远超过90%。因此,该资源可用于为语义自然语言处理(NLP)设置可靠的启用技术,我们将简要讨论其中的一些技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号