In information retrieval, cluster analysis is an important tool employed to enhance both efficiency and effectiveness of the retrieval process. Most clustering algorithms have difficulty in reflecting the closeness of documents as perceived by the user. A two phase scheme for document clustering, whose results reflect the "conceptual" clusters that are perceived by the user of the retrieval system, is proposed. Since the clusters obtained by this scheme are not characterized in terms of the document representations, a strategy for cluster searching is also developed. Both the proposed document clustering scheme and document searching strategy are experimentally evaluated using a test collection from the SMART system. The preliminary experimental results obtained are very encouraging.
在信息检索中,聚类分析是用来提高检索过程的效率和有效性的重要工具。大多数聚类算法很难反映出用户所感知的文档的紧密程度。提出了一种两阶段的文档聚类方案,其结果反映了检索系统的用户可以感知的“概念”聚类。由于通过该方案获得的聚类没有根据文档表示来表征,因此还开发了一种聚类搜索策略。拟议的文档聚类方案和文档搜索策略均使用来自SMART系统的测试集合进行了实验评估。初步的实验结果令人鼓舞。 P>
机译:DIC-DOC-K-means:使用K-means的DOCument聚类基于不相似性的初始质心选择,以提高文本文档聚类的效率
机译:基于词集的文档聚类算法提高文档聚类质量的方法
机译:基于用户的基于本体的存储集群
机译:通过基于视觉的眼动追踪,面向用户的文档摘要
机译:SpeedXML:一种用于XML文档的,面向用户的敏捷查询工具。
机译:文本文档集群中的群智能算法与各种基准
机译:基于视觉的眼动追踪的面向用户的文档摘要