首页> 外文会议>Cooperative design, visualization, and engineering >Efficient System for Clustering of Dynamic Document Database
【24h】

Efficient System for Clustering of Dynamic Document Database

机译:动态文档数据库集群的高效系统

获取原文
获取原文并翻译 | 示例

摘要

We describe in this paper, a system that groups, classifies and finds the latent semantic features in a database composed of a large number of documents. The database will be constantly growing as users who co-create it will be adding more and more new documents. Users require a system to provide them information, both about a specific document, and about the entire set of documents. This information includes statistical data about words in documents, information about aspects in which this words appears, classification, clustering, etc. To meet these expectations we propose using methods for searching for hidden patterns in multivariable data. We apply machine learning algorithms for data analysis, useful in identifying local patterns in mul-tivariate data. We consider two different algorithms described in the literature (1) Probabilistic Latent Semantic Analysis Method [2] and (2) Nonnegative Matrix Factorization algorithm described in [4] and used in the text analysis system [1].
机译:我们在本文中描述了一个对包含大量文档的数据库进行分组,分类和查找潜在语义特征的系统。随着共同创建数据库的用户将添加越来越多的新文档,该数据库将不断增长。用户需要一个系统来为他们提供有关特定文档和整个文档集的信息。这些信息包括有关文档中单词的统计数据,有关单词出现的方面,分类,聚类等方面的信息。为了满足这些期望,我们建议使用在多变量数据中搜索隐藏模式的方法。我们将机器学习算法应用于数据分析,可用于识别多变量数据中的局部模式。我们考虑文献中描述的两种不同算法(1)概率潜在语义分析方法[2]和(2)非负矩阵因式分解算法[4]描述并用于文本分析系统[1]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号