首页> 外文会议>International Conference on Tools with Artificial Intelligence >Fully-Automatic XML Clustering by Structure-Constrained Phrases
【24h】

Fully-Automatic XML Clustering by Structure-Constrained Phrases

机译:结构约束短语的全自动XML群集

获取原文

摘要

Conventional approaches to XML clustering by content and structure are generally affected by a limitation due to the adoption of the bag-of-word model for the representation of their textual contents. This choice may lead to consider structure-constrained textual items of separate XML documents as related, even though the actual meaning of such items in their respective contexts is different. To overcome such a limitation, we propose XML clustering by structure-constrained phrases. The latter is a previously unexplored method relying on the more accurate bag-of-phrase model of the XML textual content, with which to better preserve the meaning of the structure-constrained content items for improved clustering effectiveness. In order to conduct an in-depth and systematic study of the effectiveness of the proposed method, we develop a parameter-free prototypical approach to XML partitioning, which projects the XML documents into a space of XML features representing fixed-length sequences of adjacent textual items in the context of root-to-leaf paths. Feature selection without any tunable threshold is used to choose a subset of the XML features on the basis of their relevance to clustering, which is assessed through a new scoring scheme. A comparative experimentation on real-world benchmark XML corpora reveals a higher effectiveness than several state-of-the-art competitors.
机译:由于内容和结构的常规方法对XML聚类的XML聚类的方法通常受到限制的影响,因为通过了文本内容的表示的字袋模型。这种选择可能导致将单独的XML文档的结构受限的文本项目视为相关的单独XML文档,即使这些项目在各自的上下文中的实际含义不同。为了克服这样的限制,我们通过结构约束的短语提出了XML群集。后者是先前未探索的方法依赖于XML文本内容的更准确的短语模型,从中更好地保​​留结构约束内容项的含义,以提高聚类效率。为了对所提出的方法的有效性进行深入和系统的研究,我们开发了一个免费的XML分区的可参数原型方法,该方法将XML文档投影到表示相邻文本的固定长度序列的XML功能的空间中根到叶路径的上下文中的项目。没有任何可调谐阈值的特征选择用于根据其与聚类的相关性选择XML特征的子集,这通过新的评分方案进行评估。真实世界基准XML Corpora的比较实验揭示了比若干最先进的竞争对手更高的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号