...
首页> 外文期刊>Integrated Computer-Aided Engineering >Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
【24h】

Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

机译:利用短语相似性度量来检测和聚类内容丰富的RSS新闻文章

获取原文
获取原文并翻译 | 示例
           

摘要

As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-based phrase matching (CPM) model and a fuzzy compatibility clustering (FCC) model. CPM can detect RSS news articles containing phrases that are the same as well as semantically alike, and dictate the degrees of similarity of any two articles. FCC identifies and clusters non-redundant, closely related RSS news articles based on their degrees of similarity and a fuzzy compatibility relation. Experimental results show that (i) our CPM model on matching bigrams and trigrams in RSS news articles outperforms other phrase/key word-matching approaches and (ii) our FCC model generates high quality clusters and outperforms other well-known clustering techniques.
机译:随着Internet上RSS新闻提要的数量持续增加,有必要使用户的工作量减至最少,否则用户将需要扫描大量新闻文章以找到感兴趣的相关文章,这既繁琐又常常是繁琐的工作。不可能的任务。为了解决此问题,我们提出了一种称为InFRSS的新方法,该方法由基于相关的短语匹配(CPM)模型和模糊兼容性聚类(FCC)模型组成。 CPM可以检测到RSS新闻文章,这些新闻文章包含相同且在语义上相同的短语,并指示任意两篇文章的相似程度。 FCC根据它们的相似程度和模糊兼容性关系,对非冗余,密切相关的RSS新闻文章进行识别和分类。实验结果表明:(i)RSS新闻文章中用于匹配双字母组和三字母组的CPM模型优于其他短语/关键字匹配方法,并且(ii)我们的FCC模型生成高质量的聚类并且优于其他众所周知的聚类技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号