首页> 外文会议>Advanced Information Management and Service (ICIPM), 2011 7th International Conference on >An integrated probabilistic text clustering model with segment-based and word order evidence
【24h】

An integrated probabilistic text clustering model with segment-based and word order evidence

机译:具有基于句段和词序证据的集成概率文本聚类模型

获取原文
获取原文并翻译 | 示例

摘要

Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
机译:文本聚类是一个重要的研究课题,具有许多实际应用。传统的聚类算法(例如K均值和概率潜在语义索引(pLSI))仅将每个文档视为单个文本块,并且忽略重要的单词顺序信息,从而限制了它们的性能。本文提出了一种集成的概率模型,以明确组合文档中各个部分的证据和单词顺序信息。基于该模型,提出了文本聚类框架。在测试数据集上进行的实验表明,与最新算法相比,性能得到了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号