【24h】

METER: MEasuring TExt Reuse

机译:仪表:测量TExt重用

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present results from the METER (MEasuring TExt Reuse) project whose aim is to explore issues pertaining to text reuse and derivation, especially in the context of newspapers using newswire sources. Although the reuse of text by journalists has been studied in linguistics, we are not aware of any investigation using existing computational methods for this particular task. We investigate the classification of newspaper articles according to their degree of dependence upon, or derivation from, a newswire source using a simple 3-Ievel scheme designed by journalists. Three approaches to measuring text similarity are considered: 11-gram overlap, Greedy String Tiling, and sentence alignment. Measured against a manually annotated corpus of source and derived news text, we show that a combined classifier with features automatically selected performs best overall for the ternary classification achieving an average F_1-measure score of 0.664 across all three categories.
机译:在本文中,我们介绍了METER(测量TExt重用)项目的结果,该项目的目的是探索与文本重用和派生有关的问题,尤其是在使用新闻通讯社的报纸背景下。尽管已经用语言学研究了新闻工作者对文本的重用,但是我们还不知道有使用此特定任务的现有计算方法进行的调查。我们使用新闻工作者设计的简单的3-Eevel方案,根据报纸对新闻通讯源的依赖程度或衍生的新闻程度来调查报纸的分类。考虑了三种测量文本相似性的方法:11克重叠,贪婪的字符串平铺和句子对齐。对源和派生新闻文本的手动注释语料库进行测量,我们显示,具有自动选择功能的组合分类器在三元分类中总体上表现最佳,在所有三个类别中平均F_1量度得分为0.664。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号