Clustering Algorithm Based on Semantic Distance for XML Documents

机译：基于语义距离的聚类算法XML文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the information grows exponentially, it has become a new and basic requirement to reduce the querying area efficiently and accurately for information querying. This paper proposes a semantic distance based clustering algorithm for XML documents. It discusses the algorithm in two steps. Firstly, it forms some DTD clusters with all heterogeneous DTD documents by using the global semantic dictionary. Secondly, it computes the semantic distance between XML documents which corresponded certain DTD cluster, then build some finally XML clusters according threshold value given beforehand. Users can locate document cluster and query within this area without extending all over XML documents, and the querying results satisfying the users' requirements can be returned rapidly. The experiments show that this algorithm has good categorization function, and can facilitate information querying.

机译：随着信息呈指数增长，它已成为新的和基本要求，以便有效，准确地为信息查询减少查询区域。本文提出了一种基于语义距离的XML文档聚类算法。它讨论了两个步骤的算法。首先，它通过使用全局语义字典形成一些具有所有异构DTD文档的DTD集群。其次，它计算对应于某些DTD群集的XML文档之间的语义距离，然后根据事先给出的阈值构建一些最终XML群集。用户可以在此区域内找到文档群集和查询而不会在XML文档中扩展，并且可以快速返回满足用户要求的查询结果。实验表明，该算法具有良好的分类功能，可以促进信息查询。

著录项

来源
《International Workshop on Database Technology and Applications》|2009年||共4页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-53;
关键词
XML; document handling; pattern clustering; DTD cluster; XML document; clustering algorithm; document type definition; global semantic dictionary; semantic distance; Documents clustering; Heterogeneous;

机译：XML;文档处理;模式群集;DTD群集;XML文档;群集算法;文档类型定义;全局语义词典;语义距离;文件聚类;非均质;

相似文献

外文文献
中文文献
专利

1. Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis [J] . Karthick Seshadri, K. Viswanathan Iyer, Mercy Shalinie S Concurrency, practice and experience . 2019,第13期

机译：基于层次化潜在语义分析的并行文档聚类算法设计与评估
2. Semantic clustering of XML documents [J] . Arts Gkoulalas-Divanis Computing reviews . 2010,第10期

机译：XML文档的语义集群
3. Semantic Clustering of XML Documents [J] . ANDREA TAGARELLI, SERGIO GRECO ACM Transactions on Information Systems . 2010,第1期

机译：XML文档的语义聚类
4. Clustering Algorithm Based on Semantic Distance for XML Documents [C] . International Workshop on Database Technology and Applications . 2009

机译：基于语义距离的聚类算法XML文档
5. Algorithms for management of document-centric XML data. [D] . Iacob, Ionut Emil. 2005

机译：用于管理以文档为中心的XML数据的算法。
6. Clustering WHO-ART Terms Using Semantic Distance and Machine Learning Algorithms [O] . Jimison Iavindrasana, Cedric Bousquet, Patrice Degoulet, 2006

机译：使用语义距离和机器学习算法对WHO-ART术语进行聚类
7. Integrating Element and Term Semantics for Similarity-Based XML Document Clustering [O] . Jianwu Yang, William K. Cheung, Xiaoou Chen 2008

机译：基于元素和术语语义的集成，用于基于相似度的XML文档聚类

Clustering Algorithm Based on Semantic Distance for XML Documents

摘要

著录项

相似文献

相关主题

期刊订阅