【24h】

Detection of Fuzzy Duplicate Texts in News Feeds

机译:在新闻源中检测模糊重复文本

获取原文

摘要

The paper is devoted to the problem of fuzzy duplicate texts detection in news feeds. The signature methods of detecting fuzzy duplicate news are considered. Signatures describe the content of a news as one or a group of numbers. It is proposed to use Description words big signature. This signature consist of set of flags for the presence of description words and vector of names. This vector include names of objects, countries, names of politicians. It allow setting the exact direction of the news on this or that event. In paper the results of testing the signature methods are given. Proposed signature showed good results both in recall and in precision of duplicate news detection.
机译:本文致力于新闻饲料中模糊重复文本检测的问题。考虑了检测模糊重复新闻的签名方法。签名描述了新闻的内容作为一个或一组数字。建议使用描述单词大签名。此签名包括一组标志,用于存在描述单词和名称矢量。此矢量包括对象,国家,政治家姓名的名称。它允许在此或该事件上设置新闻的确切方向。在纸上,给出了测试签名方法的结果。建议的签名显示出良好的效果,召回和重复新闻检测的精确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号