首页> 外文会议>European Starting AI Researcher Symposium >OHDOCLUS - Online and Hierarchical Document Clustering
【24h】

OHDOCLUS - Online and Hierarchical Document Clustering

机译:OHDOClus - 在线和分层文档群集

获取原文

摘要

Usually, clustering algorithms consider that document collections are static and are processed as a whole. However, in contexts where data is constantly being produced (e.g. the Web), systems that receive and process documents incrementally are becoming more and more important. We propose OHDOCLUS, an online and hierarchical algorithm for document clustering. OHDOCLUS creates a tree of clusters where documents are classified as soon as they are received. It is based on COBWEB and CLASSIT, two well-known data clustering algorithms that create hierarchies of probabilistic concepts and were seldom applied to text data. An experimental evaluation was conducted with categorized corpora, and the preliminary results confirm the validity of the proposed method.
机译:通常,群集算法认为文档集合是静态的,并作为整体处理。 然而,在持续生成数据的上下文中(例如,Web),接收和处理文档的系统逐渐变得越来越重要。 我们提出了Ohdoclus,一个用于文档群集的在线和分层算法。 Ohdoclus创建了一群集群,其中一旦收到文件就被分类。 它基于COBWEB和Classit,两个众所周知的数据聚类算法,它创建了概率概念的层次结构,很少应用于文本数据。 通过分类的语料库进行了实验评估,初步结果证实了所提出的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号