首页> 外文会议>Natural language processing Pacific Rim symposium >Filtering of Word Bigrams by Considering Similar Bigrams as Evidence of Unimportance
【24h】

Filtering of Word Bigrams by Considering Similar Bigrams as Evidence of Unimportance

机译:通过将类似的巨头视为不重要的证据来过滤Word Bigrams

获取原文

摘要

Compound terms play important roles in may applications. But there are many trivial terms which are mostly useless and sometimes harmful. In this paper, we focus on word bigrams rather than general n-grams, and describe a new method of distinguishing important word bigrams from trivial ones. This method characterizes trivial bigrams as those having remarkably similar ones. By automatically discarding trivial bigrams using this method, we can reduce considerably the amount of human work needed for selecting important bigrams. In an experiment in which important bigrams are selected from Japanese newspaper articles, we were able to reduce about 25% of human work with almost no loss of important bigrams.
机译:复合条款在5月申请中起重要作用。但是有许多琐碎的术语主要是无用的,有时是有害的。在本文中,我们专注于Word Bigrams而不是一般n-grams,并描述了从琐碎者中区分重要词语Bigrams的新方法。该方法表征了琐碎的Bigrams,因为那些具有显着相似的Bigrams。通过使用这种方法自动丢弃琐碎的巨头,我们可以减少选择重要的Bigram所需的人类工作量。在一项实验中,重要的是重要的Bigrams选自日本报纸文章,我们能够减少约25%的人类工作,几乎不会损失重要的巨头。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号