首页> 外文会议>Conference on empirical methods in natural language processing >Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem
【24h】

Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem

机译:永不遗弃少数群体:使用设置封面问题穷举微博上的突发短语

获取原文

摘要

We propose a language-independent data-driven method to exhaustively extract bursty phrases of arbitrary forms (e.g., phrases other than simple noun phrases) from microblogs. The burst (i.e., the rapid increase of the occurrence) of a phrase causes the burst of overlapping N-grams including incomplete ones. In other words, bursty incomplete N-grams inevitably overlap bursty phrases. Thus, the proposed method performs the extraction of bursty phrases as the set cover problem in which all bursty N-grams are covered by a minimum set of bursty phrases. Experimental results using Japanese Twitter data showed that the proposed method outperformed word-based, noun phrase-based, and segmentation-based methods both in terms of accuracy and coverage.
机译:我们提出了一种独立于语言的数据驱动方法,以从微博中详尽地提取任意形式的突发性短语(例如,除简单名词短语之外的短语)。短语的突发(即出现的迅速增加)会导致包括不完整的N-gram重叠的N-gram突发。换句话说,突发性不完整的N-gram不可避免地与突发性短语重叠。因此,所提出的方法执行突发短语的提取作为集合覆盖问题,其中所有突发N-gram被最小组的突发短语覆盖。使用日语Twitter数据进行的实验结果表明,该方法在准确性和覆盖率方面均优于基于单词,基于名词短语和基于分段的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号