首页> 外文会议>Web technologies and applications >DVD: A Model for Event Diversified Versions Discovery
【24h】

DVD: A Model for Event Diversified Versions Discovery

机译:DVD:事件多样化版本发现的模型

获取原文
获取原文并翻译 | 示例

摘要

With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.
机译:随着事件检测和跟踪技术的发展,从许多来源收集文本信息并将其结构化为事件是可行的,这些事件可以在线自动构建并随时间更新。总是有多种版本来描述事件,用户通常渴望了解所有版本。拥有大量文档,用户几乎无法阅读所有文档。在本文中,我们正式定义了事件多样化版本发现的问题。我们介绍了一种新颖且有原则的模型(称为DVD),用于发现事件的多种版本。与传统的聚类方法不同,我们在结合了共现和语义的二分图上应用迭代算法,以选择流行词并对其进行过滤,以减少特定事件中文档之间的紧密相关性。单词之间的混合链接结构用于查找层次关系。我们采用网络社区发现算法来构建虚拟文档,该虚拟文档由一袋表示不同版本之一的单词组成。在Rocchio分类框架下,我们可以将文档分类为多种版本。使用我们新颖的评估方法,对两个真实数据集的经验实验表明DVD是有效的,并且优于传统的K均值和LDA等各种相关算法。

著录项

  • 来源
    《Web technologies and applications》|2011年|p.168-180|共13页
  • 会议地点 Beijing(CN);Beijing(CN)
  • 作者单位

    Department of Machine Intelligence, Peking University, Beijing 100871, China Key Laboratory on Machine Perception, Ministry of Education, Beijing 100871, China;

    Department of Computer Science, Peking University, Beijing 100871, China;

    Department of Computer Science, Peking University, Beijing 100871, China;

    Department of Machine Intelligence, Peking University, Beijing 100871, China Key Laboratory on Machine Perception, Ministry of Education, Beijing 100871, China;

    Service Software Chongqing Institute of ZTE Corporation, China;

    Service Software Chongqing Institute of ZTE Corporation, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机网络;
  • 关键词

    diversified versions discovery; popular words selection;

    机译:多种版本发现;流行词选择;
  • 入库时间 2022-08-26 14:26:15

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号