首页> 外文期刊>Computers and the Humanities >Compilation of an idiom example database for supervised idiom identification
【24h】

Compilation of an idiom example database for supervised idiom identification

机译:汇编用于监督习语识别的习语示例数据库

获取原文
获取原文并翻译 | 示例
       

摘要

Some phrases can be interpreted in their context either idiomatically (figuratively) or literally. The precise identification of idioms is essential in order to achieve full-fledged natural language processing. Because of this, the authors of this paper have created an idiom corpus for Japanese. This paper reports on the corpus itself and the results of an idiom identification experiment conducted using the corpus. The corpus targeted 146 ambiguous idioms, and consists of 102,856 examples, each of which is annotated with a literal/idiomatic label. All sentences were collected from the World Wide Web. For idiom identification, 90 out of the 146 idioms were targeted and a word sense disambiguation (WSD) method was adopted using both common WSD features and idiom-specific features. The corpus and the experiment are both, as far as can be determined, the largest of their kinds. It was discovered that a standard supervised WSD method works well for idiom identification and it achieved accuracy levels of 89.25 and 88.86%, with and without idiom-specific features, respectively. It was also found that the most effective idiom-specific feature is the one that involves the adjacency of idiom constituents.
机译:某些短语可以在其上下文中惯用(比喻)或直译。为了实现成熟的自然语言处理,对习语的精确识别至关重要。因此,本文的作者为日语创建了一个成语语料库。本文报道了语料库本身以及使用语料库进行的成语识别实验的结果。语料库针对146个模棱两可的习语,由102,856个示例组成,每个示例都标有文字/惯用标签。所有句子都是从万维网收集的。对于成语识别,以146个成语中的90个为目标,并使用了常见的WSD功能和特定于成语的功能,采用了词义消歧(WSD)方法。据可以确定,语料库和实验都是最大的。结果发现,一种标准的监督WSD方法对于成语识别非常有效,其准确度分别为89.25和88.86%,分别具有和没有特定于语言的特征。还发现,最有效的成语特定功能是涉及成语成分邻接的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号