首页> 外文会议>OnTheMove Confederated International Conferences >Dynamic Topic Mining from News Stream Data
【24h】

Dynamic Topic Mining from News Stream Data

机译:动态主题挖掘新闻流数据

获取原文

摘要

Given the popularity of Web news services, we propose a topic mining framework that supports the identification of meaningful topics (themes) from news stream data. News articles are retrieved from Web news services and processed by data mining tools to produce useful higher-level knowledge, which is stored in a content description database. Instead of interacting with a Web news service directly, by exploiting the knowledge in the database, an information delivery agent can present an answer in response to a user request. A key challenging issue within news repository management is the high rate of documents update. That is, since several hundred news articles are published everyday by a single Web news service, it is essential to develop incremental data mining tools to cope with such dynamic environments. To this end, we present a sophisticated incremental hierarchical document clustering algorithm using a neighborhood search. The novelty of our proposed algorithm lies in exploiting locality information to reduce the amount of computation while producing high-quality clusters. Other components of topic mining (e.g., learning topic ontologies) can be performed based on the obtained document hierarchy. Experimental results show that our proposed incremental clustering produces high-quality clusters, and topic ontology provides an interpretation of the data at different levels of abstraction.
机译:鉴于Web新闻服务的普及,我们提出了一个主题挖掘框架,支持从新闻流数据中识别有意义的主题(主题)。从Web新闻服务中检索新闻文章,并由数据挖掘工具处理,以产生有用的更高级别知识,该知识存储在内容描述数据库中。通过利用数据库中的知识而不是直接与Web新闻服务进行交互,而不是利用数据库,信息传递代理可以响应于用户请求呈现答案。新闻存储库管理中的关键具有挑战性问题是更新的文件率高。也就是说,由于每天由单个Web新闻服务发布几百新闻文章,因此必须开发增量数据挖掘工具来应对这种动态环境。为此,我们使用邻域搜索呈现了一种复杂的增量分层文档群集算法。我们所提出的算法的新颖性在于利用地区信息,以减少计算的计算量,同时产生高质量的集群。主题挖掘的其他组件可以基于所获得的文档层次结构执行挖掘(例如,学习主题本体)。实验结果表明,我们提出的增量聚类产生了高质量的集群,主题本体提供了对不同抽象级别数据的解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号