【24h】

Detection of Fuzzy Duplicate Texts in News Feeds

机译:新闻源中模糊重复文本的检测

获取原文

摘要

The paper is devoted to the problem of fuzzy duplicate texts detection in news feeds. The signature methods of detecting fuzzy duplicate news are considered. Signatures describe the content of a news as one or a group of numbers. It is proposed to use Description words big signature. This signature consist of set of flags for the presence of description words and vector of names. This vector include names of objects, countries, names of politicians. It allow setting the exact direction of the news on this or that event. In paper the results of testing the signature methods are given. Proposed signature showed good results both in recall and in precision of duplicate news detection.
机译:本文致力于新闻源中重复文本的模糊检测问题。考虑了检测模糊重复新闻的签名方法。签名将新闻的内容描述为一个或一组数字。建议使用描述字大签名。该签名由用于描述单词和名称矢量的标志集组成。此向量包括对象名称,国家/地区,政客名称。它允许设置有关该事件或该事件的确切新闻方向。本文给出了签名方法的测试结果。提议的签名在召回和重复新闻检测的准确性方面均显示出良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号