Finding hierarchical structures of document collections By using tolerance relations

机译：通过公差关系查找文档集合的层次结构

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We develop a hierarchical clustering algorithm based on Tolerance Rough Set Model (TRSM). Text clustering is one of ways to find the structure of the text collection. The quality of text clustering depends not only on the clustering algorithm but also on the document representation model. We aim to enrich representations concerning with documents and their distance according to semantic relations introduced by TRSM. The model offers a way of considering semantics relatedness between documents. It is an extension of the equivalence rough set model by employing tolerance relations instead of equivalence relations. The main advantages of the proposed model are it is more appropriate for textual data and the computation can be done efficiently. Based on the tolerance rough set model, we develop a hierarchical document clustering algorithm. The algorithm is evaluated and validated experimentally on test collections. The results suggest that this clustering algorithm can be well adapted to text mining.

机译：我们开发了基于公差粗糙集模型（TRSM）的分层聚类算法。文本聚类是查找文本集合结构的方法之一。文本聚类的质量不仅取决于聚类算法，还取决于文档表示模型。我们旨在根据TRSM引入的语义关系来丰富与文档有关的表示形式及其距离。该模型提供了一种考虑文档之间语义相关性的方法。它是通过采用公差关系而不是等价关系对等价粗糙集模型的扩展。该模型的主要优点是它更适合于文本数据，并且可以高效地进行计算。基于容差粗糙集模型，我们开发了一种分层文档聚类算法。该算法在测试集合上进行了实验评估和验证。结果表明，该聚类算法可以很好地适应文本挖掘。

著录项

来源
《International Symposium on Knowledge and Systems Sciences: Challenges to Complexity(KSS'2000); 20000925-27; Ishikawa(JP)》|2000年|P.117-124|共8页
会议地点 Ishikawa(JP)
作者
Saori Kawasaki; Tu Bao Ho;
展开▼
作者单位

Japan Advanced Institute of Science and Technology Ishikawa, 923-1292 JAPAN;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类系统科学;
关键词
入库时间 2022-08-26 14:05:52

相似文献

外文文献
中文文献
专利

1. Finding Maximal Sequential Patterns in Text Document Collections and Single Documents [J] . R.A. García-Hernández, J.Fco. Martínez-Trinidad, J.A. Carrasco-Ochoa Informatica: An International Journal of Computing and Informatics . 2010,第1期

机译：在文本文档集合和单个文档中查找最大顺序模式
2. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling [J] . Yi Yang, Quanming Yao, Huamin Qu Visual Informatics . 2017,第1期

机译：VISTopic：一种可视化分析系统，可使用分层主题建模来理解大型文档集合
3. HiPP: A Novel Hierarchical Point Placement Strategy and its Application to the Exploration of Document Collections [J] . Paulovich F.V., Minghim R. IEEE transactions on visualization and computer graphics . 2008,第6期

机译：HiPP：一种新颖的分层点放置策略及其在文档收集探索中的应用
4. Finding hierarchical structures of document collections By using tolerance relations [C] . Saori Kawasaki, Tu Bao Ho, Japan Advanced Institute of Science and Technology(JAIST), International Symposium on Knowledge and Systems Sciences . 2000

机译：使用公差关系找到文档集合的分层结构
5. Parallel information retrieval and visualization on large, unstructured document collections using web link information. [D] . Alford, Kenneth Lowell. 2000

机译：使用Web链接信息对大型非结构化文档集合进行并行信息检索和可视化。
6. Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data? [O] . T. L. Odong, J. van Heerwaarden, J. Jansen, -1

机译：确定种质集合的遗传结构：传统的层次聚类方法是否适合分子标记数据？
7. Hierarchical semi-supervised confidence-based active clustering and its application to the extraction of topic hierarchies from document collections [O] . Bruno Magalhães Nogueira -1

机译：基于分层的半监督基于置信的主力群集及其在文档集合中提取主题层次结构的应用程序

Finding hierarchical structures of document collections By using tolerance relations

摘要

著录项

相似文献

相关主题

期刊订阅